Commit | Line | Data |
---|---|---|
57625988 DD |
1 | To: rsync@lists.samba.org |
2 | Subject: Rsync performance increase through buffering | |
3 | From: Craig Barratt <craig@atheros.com> | |
4 | Message-ID: <auto-000002369223@atheros.com> | |
5 | Date: Sun, 08 Dec 2002 23:48:57 -0800 | |
6 | ||
7 | I've been studying the read and write buffering in rsync and it turns | |
8 | out most I/O is done just a couple of bytes at a time. This means there | |
9 | are lots of system calls, and also most network traffic comprises lots | |
10 | of small packets. The behavior is most extreme when sending/receiving | |
11 | file deltas of identical files. | |
12 | ||
13 | The main case where I/O is buffered is writes from the server (when | |
14 | io multiplexing is on). These are usually buffered in 4092 byte | |
15 | chunks with a 4 byte header. However, reading of these packets is | |
16 | usually unbuffered, and writes from the client are generally not | |
17 | buffered. For example: when receiving 1st phase checksums (6 bytes | |
18 | per block), 2 reads are done: one of 4 bytes and one of 2 bytes, | |
19 | meaning there are 4 system calls (select/read/select/read) per 6 | |
20 | bytes of checksum data). | |
21 | ||
22 | One cost of this is some performance, but a significant issue is that | |
23 | unbuffered writes generate very short (and very many) ethernet packets, | |
24 | which means the overhead is quite large on slow network connections. | |
25 | ||
26 | The initial file_list writing is typically buffered, but reading it on | |
27 | the client is not. | |
28 | ||
29 | There are some other unneeded system calls: | |
30 | ||
31 | - One example is that show_progress() calls gettimeofday() even | |
32 | if do_progress is not set. show_progress() is called on every | |
33 | block, so there is an extra system call per (700 byte) block. | |
184753c5 WD |
34 | (NOTE: this was already fixed in 2.6.0, so it is no longer in |
35 | this patch.) | |
57625988 DD |
36 | |
37 | - Another example is that file_write writes each matching (700 byte) | |
38 | block without buffering, so that's another system call per block. | |
39 | ||
40 | To study this behavior I used rsync-2.5.6cvs and had a benchmark area | |
41 | comprising around 7800 files of total size 530MB. | |
42 | ||
43 | Here are some results doing sends and receives via rsyncd, all on the | |
44 | same machine, with identical source and destination files. In each | |
45 | case --ignore-times (-I) is set, so that every file is processed: | |
46 | ||
47 | - Send test: | |
48 | ||
49 | strace -f rsync -Ir . localhost::test |& wc | |
50 | ||
51 | shows there are about 2,488,775 system calls. | |
52 | ||
53 | - Receive test: | |
54 | ||
55 | strace -f rsync -Ir localhost::test . |& wc | |
56 | ||
57 | shows there are about 1,615,931 system calls. | |
58 | ||
59 | - Rsyncd has a roughly similar numbers of system calls. | |
60 | ||
61 | - Send test from another machine (cygwin/WinXP laptop): | |
62 | ||
63 | tcpdump port 873 |& wc | |
64 | ||
65 | shows there are about 701,111 ethernet packets (many of them only | |
66 | have a 4 byte payload). | |
67 | ||
68 | Since the source and dest files are the same, the send test only | |
69 | wrote 1,738,797 bytes and read 2,139,848 bytes. | |
70 | ||
71 | These results are similar to rsync 2.5.5. | |
72 | ||
73 | Below is a patch to a few files that adds read and write buffering in | |
74 | the places where the I/O was unbuffered, adds buffering to write_file() | |
75 | and removes the unneeded gettimeofday() system call in show_progress(). | |
76 | ||
77 | The results with the patch are: | |
78 | ||
79 | - Send test: 46,835 system calls, versus 2,488,775. | |
80 | ||
81 | - Receive test: 138,367 system calls, versus 1,615,931. | |
82 | ||
83 | - Send test from another machine: 5,255 ethernet packets, versus 701,111. | |
84 | If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that | |
85 | means the base case transfers an extra 42MB of data, even though the | |
86 | useful data is only around 2MB. | |
87 | ||
88 | The absolute running time on the local rsyncd test isn't much different, | |
89 | probably because the test is really disk io limited and system calls on | |
90 | an unloaded linux system are pretty fast. | |
91 | ||
92 | However, on a network test doing a send from cygwin/WinXP to rsyncd | |
93 | on rh-linux the running time improves from about 700 seconds to 215 | |
94 | seconds (with a cpu load of around 17% versus 58%, if you believe | |
95 | cygwin's cpu stats). This is probably an extreme case since the system | |
96 | call penalty in cygwin is high. But I would suspect a significant | |
97 | improvement is possible with a slow network connection, since a lot | |
98 | less data is being sent. | |
99 | ||
100 | Note also that without -I rsync is already very fast, since it skips | |
101 | (most) files based on attributes. | |
102 | ||
103 | With or without this patch the test suite passes except for | |
104 | daemon-gzip-upload. One risk of buffering is the potential for | |
105 | a bug caused by a missing io_flush: deadlock is possible, so try | |
106 | the patch at your own risk... | |
107 | ||
108 | Craig | |
109 | ||
110 | ########################################################################### | |
184753c5 WD |
111 | Note: This is an updated patch for 2.6.0, created by Wayne Davison. |
112 | ########################################################################### | |
113 | --- orig/fileio.c 6 Dec 2003 21:07:27 -0000 1.8 | |
114 | +++ rsync-2.6.0/fileio.c 16 Dec 2003 19:12:29 -0000 | |
115 | @@ -69,6 +69,21 @@ | |
116 | return len; | |
117 | } | |
57625988 | 118 | |
57625988 | 119 | + |
184753c5 WD |
120 | +static char *wf_writeBuf; |
121 | +static size_t wf_writeBufSize; | |
122 | +static size_t wf_writeBufCnt; | |
123 | + | |
124 | +int flush_write_file(int f) | |
125 | +{ | |
126 | + int ret = write(f, wf_writeBuf, wf_writeBufCnt); | |
127 | + if (ret < 0) | |
57625988 | 128 | + return ret; |
184753c5 WD |
129 | + /* if (ret < wf_writeBufCnt) ??? */ |
130 | + wf_writeBufCnt = 0; | |
131 | + return ret; | |
132 | +} | |
133 | + | |
134 | /* | |
135 | * write_file does not allow incomplete writes. It loops internally | |
136 | * until len bytes are written or errno is set. | |
137 | @@ -83,7 +98,22 @@ | |
138 | int len1 = MIN(len, SPARSE_WRITE_SIZE); | |
139 | r1 = write_sparse(f, buf, len1); | |
140 | } else { | |
141 | - r1 = write(f, buf, len); | |
142 | + if (!wf_writeBuf) { | |
143 | + wf_writeBufSize = MAX_MAP_SIZE; | |
144 | + wf_writeBufCnt = 0; | |
145 | + wf_writeBuf = new_array(char, MAX_MAP_SIZE); | |
146 | + if (!wf_writeBuf) out_of_memory("write_file"); | |
147 | + } | |
148 | + r1 = MIN(len, wf_writeBufSize - wf_writeBufCnt); | |
149 | + if (r1) { | |
150 | + memcpy(wf_writeBuf + wf_writeBufCnt, buf, r1); | |
151 | + wf_writeBufCnt += r1; | |
152 | + } | |
153 | + if (wf_writeBufCnt == wf_writeBufSize) { | |
154 | + flush_write_file(f); | |
155 | + if (!r1 && len) | |
156 | + continue; | |
157 | + } | |
158 | } | |
159 | if (r1 <= 0) { | |
160 | if (ret > 0) return ret; | |
161 | --- orig/flist.c 15 Dec 2003 08:10:31 -0000 1.144 | |
162 | +++ rsync-2.6.0/rsync-2.6.0/flist.c 16 Dec 2003 19:12:30 -0000 | |
163 | @@ -925,7 +925,7 @@ | |
57625988 DD |
164 | flist = flist_new(); |
165 | ||
166 | if (f != -1) { | |
167 | - io_start_buffering(f); | |
168 | + io_start_buffering_out(f); | |
184753c5 WD |
169 | if (filesfrom_fd >= 0) { |
170 | if (argv[0] && !push_dir(argv[0], 0)) { | |
171 | rprintf(FERROR, "push_dir %s failed: %s\n", | |
172 | --- orig/io.c 15 Dec 2003 00:54:44 -0000 1.112 | |
173 | +++ rsync-2.6.0/rsync-2.6.0/io.c 16 Dec 2003 19:12:30 -0000 | |
57625988 DD |
174 | @@ -41,8 +41,8 @@ |
175 | ||
176 | static int io_multiplexing_out; | |
177 | static int io_multiplexing_in; | |
178 | -static int multiplex_in_fd; | |
179 | -static int multiplex_out_fd; | |
180 | +static int multiplex_in_fd = -1; | |
181 | +static int multiplex_out_fd = -1; | |
182 | static time_t last_io; | |
183 | static int no_flush; | |
184 | ||
184753c5 | 185 | @@ -441,17 +441,31 @@ |
57625988 DD |
186 | static size_t remaining; |
187 | int tag, ret = 0; | |
188 | char line[1024]; | |
184753c5 WD |
189 | + static char *buffer; |
190 | + static size_t bufferIdx = 0; | |
191 | + static size_t bufferSz; | |
57625988 DD |
192 | |
193 | - if (!io_multiplexing_in || fd != multiplex_in_fd) | |
194 | + if (fd != multiplex_in_fd) | |
195 | return read_timeout(fd, buf, len); | |
196 | ||
197 | + if (!io_multiplexing_in && remaining == 0) { | |
198 | + if (!buffer) { | |
199 | + bufferSz = 2 * IO_BUFFER_SIZE; | |
184753c5 | 200 | + buffer = new_array(char, bufferSz); |
57625988 DD |
201 | + if (!buffer) out_of_memory("read_unbuffered"); |
202 | + } | |
203 | + remaining = read_timeout(fd, buffer, bufferSz); | |
184753c5 WD |
204 | + bufferIdx = 0; |
205 | + } | |
57625988 DD |
206 | + |
207 | while (ret == 0) { | |
208 | if (remaining) { | |
209 | len = MIN(len, remaining); | |
210 | - read_loop(fd, buf, len); | |
211 | + memcpy(buf, buffer + bufferIdx, len); | |
212 | + bufferIdx += len; | |
213 | remaining -= len; | |
214 | ret = len; | |
215 | - continue; | |
216 | + break; | |
217 | } | |
218 | ||
219 | read_loop(fd, line, 4); | |
184753c5 | 220 | @@ -460,8 +474,16 @@ |
57625988 DD |
221 | remaining = tag & 0xFFFFFF; |
222 | tag = tag >> 24; | |
223 | ||
224 | - if (tag == MPLEX_BASE) | |
225 | + if (tag == MPLEX_BASE) { | |
226 | + if (!buffer || remaining > bufferSz) { | |
184753c5 | 227 | + buffer = realloc_array(buffer, char, remaining); |
57625988 DD |
228 | + if (!buffer) out_of_memory("read_unbuffered"); |
229 | + bufferSz = remaining; | |
230 | + } | |
231 | + read_loop(fd, buffer, remaining); | |
184753c5 | 232 | + bufferIdx = 0; |
57625988 | 233 | continue; |
184753c5 | 234 | + } |
57625988 DD |
235 | |
236 | tag -= MPLEX_BASE; | |
237 | ||
184753c5 | 238 | @@ -483,6 +505,9 @@ |
57625988 DD |
239 | remaining = 0; |
240 | } | |
184753c5 WD |
241 | |
242 | + if (remaining == 0) | |
57625988 | 243 | + io_flush(); |
184753c5 | 244 | + |
57625988 DD |
245 | return ret; |
246 | } | |
247 | ||
184753c5 | 248 | @@ -499,8 +524,6 @@ |
57625988 | 249 | size_t total=0; |
184753c5 | 250 | |
57625988 DD |
251 | while (total < N) { |
252 | - io_flush(); | |
253 | - | |
184753c5 | 254 | ret = read_unbuffered(fd, buffer + total, N-total); |
57625988 DD |
255 | total += ret; |
256 | } | |
184753c5 | 257 | @@ -683,7 +706,7 @@ |
57625988 DD |
258 | static char *io_buffer; |
259 | static int io_buffer_count; | |
260 | ||
261 | -void io_start_buffering(int fd) | |
262 | +void io_start_buffering_out(int fd) | |
263 | { | |
264 | if (io_buffer) return; | |
265 | multiplex_out_fd = fd; | |
184753c5 | 266 | @@ -692,6 +715,11 @@ |
57625988 DD |
267 | io_buffer_count = 0; |
268 | } | |
269 | ||
270 | +void io_start_buffering_in(int fd) | |
271 | +{ | |
272 | + multiplex_in_fd = fd; | |
273 | +} | |
274 | + | |
275 | /** | |
276 | * Write an message to a multiplexed stream. If this fails then rsync | |
277 | * exits. | |
184753c5 | 278 | @@ -882,7 +910,7 @@ |
57625988 DD |
279 | { |
280 | multiplex_out_fd = fd; | |
281 | io_flush(); | |
282 | - io_start_buffering(fd); | |
283 | + io_start_buffering_out(fd); | |
284 | io_multiplexing_out = 1; | |
285 | } | |
286 | ||
184753c5 WD |
287 | --- orig/main.c 16 Dec 2003 18:02:03 -0000 1.173 |
288 | +++ rsync-2.6.0/main.c 16 Dec 2003 19:12:30 -0000 | |
289 | @@ -380,6 +380,8 @@ | |
57625988 DD |
290 | exit_cleanup(0); |
291 | } | |
292 | ||
184753c5 WD |
293 | + io_start_buffering_in(f_in); |
294 | + io_start_buffering_out(f_out); | |
57625988 DD |
295 | send_files(flist,f_out,f_in); |
296 | io_flush(); | |
297 | report(f_out); | |
184753c5 | 298 | @@ -454,7 +456,7 @@ |
57625988 DD |
299 | close(error_pipe[1]); |
300 | if (f_in != f_out) close(f_in); | |
301 | ||
302 | - io_start_buffering(f_out); | |
303 | + io_start_buffering_out(f_out); | |
304 | ||
305 | io_set_error_fd(error_pipe[0]); | |
306 | ||
184753c5 WD |
307 | @@ -508,6 +510,7 @@ |
308 | } | |
57625988 DD |
309 | } |
310 | ||
184753c5 | 311 | + io_start_buffering_in(f_in); |
57625988 DD |
312 | if (delete_mode && !delete_excluded) |
313 | recv_exclude_list(f_in); | |
314 | ||
184753c5 | 315 | @@ -606,6 +609,7 @@ |
57625988 DD |
316 | extern int cvs_exclude; |
317 | extern int delete_mode; | |
318 | extern int delete_excluded; | |
184753c5 | 319 | + io_start_buffering_out(f_out); |
57625988 DD |
320 | if (cvs_exclude) |
321 | add_cvs_excludes(); | |
184753c5 WD |
322 | if (delete_mode && !delete_excluded) |
323 | @@ -617,7 +621,10 @@ | |
324 | if (verbose > 3) | |
57625988 DD |
325 | rprintf(FINFO,"file list sent\n"); |
326 | ||
184753c5 WD |
327 | + io_flush(); |
328 | + io_start_buffering_out(f_out); | |
57625988 | 329 | send_files(flist,f_out,f_in); |
184753c5 WD |
330 | + io_flush(); |
331 | if (protocol_version >= 24) { | |
332 | /* final goodbye message */ | |
57625988 | 333 | read_int(f_in); |
184753c5 | 334 | @@ -629,6 +636,7 @@ |
57625988 DD |
335 | wait_process(pid, &status); |
336 | } | |
337 | report(-1); | |
338 | + io_flush(); | |
339 | exit_cleanup(status); | |
340 | } | |
341 | ||
184753c5 WD |
342 | --- orig/proto.h 6 Dec 2003 21:07:27 -0000 1.161 |
343 | +++ rsync-2.6.0/proto.h 16 Dec 2003 19:12:30 -0000 | |
344 | @@ -68,6 +68,7 @@ | |
345 | const char *line, int include); | |
346 | void add_cvs_excludes(void); | |
347 | int sparse_end(int f); | |
348 | +int flush_write_file(int f); | |
349 | int write_file(int f,char *buf,size_t len); | |
350 | struct map_struct *map_file(int fd,OFF_T len); | |
351 | char *map_ptr(struct map_struct *map,OFF_T offset,int len); | |
352 | @@ -101,7 +102,8 @@ | |
57625988 DD |
353 | void read_buf(int f,char *buf,size_t len); |
354 | void read_sbuf(int f,char *buf,size_t len); | |
355 | unsigned char read_byte(int f); | |
356 | -void io_start_buffering(int fd); | |
357 | +void io_start_buffering_out(int fd); | |
358 | +void io_start_buffering_in(int fd); | |
359 | void io_flush(void); | |
360 | void io_end_buffering(void); | |
361 | void write_int(int f,int32 x); | |
184753c5 WD |
362 | --- orig/receiver.c 15 Dec 2003 08:14:27 -0000 1.56 |
363 | +++ rsync-2.6.0/receiver.c 16 Dec 2003 19:12:30 -0000 | |
364 | @@ -307,6 +307,8 @@ | |
57625988 DD |
365 | offset += len; |
366 | } | |
367 | ||
184753c5 | 368 | + flush_write_file(fd); |
57625988 | 369 | + |
184753c5 WD |
370 | if (do_progress) |
371 | end_progress(total_size); | |
57625988 | 372 |