Commit | Line | Data |
---|---|---|
57625988 DD |
1 | To: rsync@lists.samba.org |
2 | Subject: Rsync performance increase through buffering | |
3 | From: Craig Barratt <craig@atheros.com> | |
4 | Message-ID: <auto-000002369223@atheros.com> | |
5 | Date: Sun, 08 Dec 2002 23:48:57 -0800 | |
6 | ||
7 | I've been studying the read and write buffering in rsync and it turns | |
8 | out most I/O is done just a couple of bytes at a time. This means there | |
9 | are lots of system calls, and also most network traffic comprises lots | |
10 | of small packets. The behavior is most extreme when sending/receiving | |
11 | file deltas of identical files. | |
12 | ||
13 | The main case where I/O is buffered is writes from the server (when | |
14 | io multiplexing is on). These are usually buffered in 4092 byte | |
15 | chunks with a 4 byte header. However, reading of these packets is | |
16 | usually unbuffered, and writes from the client are generally not | |
17 | buffered. For example: when receiving 1st phase checksums (6 bytes | |
18 | per block), 2 reads are done: one of 4 bytes and one of 2 bytes, | |
19 | meaning there are 4 system calls (select/read/select/read) per 6 | |
20 | bytes of checksum data). | |
21 | ||
22 | One cost of this is some performance, but a significant issue is that | |
23 | unbuffered writes generate very short (and very many) ethernet packets, | |
24 | which means the overhead is quite large on slow network connections. | |
25 | ||
26 | The initial file_list writing is typically buffered, but reading it on | |
27 | the client is not. | |
28 | ||
29 | There are some other unneeded system calls: | |
30 | ||
31 | - One example is that show_progress() calls gettimeofday() even | |
32 | if do_progress is not set. show_progress() is called on every | |
33 | block, so there is an extra system call per (700 byte) block. | |
34 | ||
35 | - Another example is that file_write writes each matching (700 byte) | |
36 | block without buffering, so that's another system call per block. | |
37 | ||
38 | To study this behavior I used rsync-2.5.6cvs and had a benchmark area | |
39 | comprising around 7800 files of total size 530MB. | |
40 | ||
41 | Here are some results doing sends and receives via rsyncd, all on the | |
42 | same machine, with identical source and destination files. In each | |
43 | case --ignore-times (-I) is set, so that every file is processed: | |
44 | ||
45 | - Send test: | |
46 | ||
47 | strace -f rsync -Ir . localhost::test |& wc | |
48 | ||
49 | shows there are about 2,488,775 system calls. | |
50 | ||
51 | - Receive test: | |
52 | ||
53 | strace -f rsync -Ir localhost::test . |& wc | |
54 | ||
55 | shows there are about 1,615,931 system calls. | |
56 | ||
57 | - Rsyncd has a roughly similar numbers of system calls. | |
58 | ||
59 | - Send test from another machine (cygwin/WinXP laptop): | |
60 | ||
61 | tcpdump port 873 |& wc | |
62 | ||
63 | shows there are about 701,111 ethernet packets (many of them only | |
64 | have a 4 byte payload). | |
65 | ||
66 | Since the source and dest files are the same, the send test only | |
67 | wrote 1,738,797 bytes and read 2,139,848 bytes. | |
68 | ||
69 | These results are similar to rsync 2.5.5. | |
70 | ||
71 | Below is a patch to a few files that adds read and write buffering in | |
72 | the places where the I/O was unbuffered, adds buffering to write_file() | |
73 | and removes the unneeded gettimeofday() system call in show_progress(). | |
74 | ||
75 | The results with the patch are: | |
76 | ||
77 | - Send test: 46,835 system calls, versus 2,488,775. | |
78 | ||
79 | - Receive test: 138,367 system calls, versus 1,615,931. | |
80 | ||
81 | - Send test from another machine: 5,255 ethernet packets, versus 701,111. | |
82 | If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that | |
83 | means the base case transfers an extra 42MB of data, even though the | |
84 | useful data is only around 2MB. | |
85 | ||
86 | The absolute running time on the local rsyncd test isn't much different, | |
87 | probably because the test is really disk io limited and system calls on | |
88 | an unloaded linux system are pretty fast. | |
89 | ||
90 | However, on a network test doing a send from cygwin/WinXP to rsyncd | |
91 | on rh-linux the running time improves from about 700 seconds to 215 | |
92 | seconds (with a cpu load of around 17% versus 58%, if you believe | |
93 | cygwin's cpu stats). This is probably an extreme case since the system | |
94 | call penalty in cygwin is high. But I would suspect a significant | |
95 | improvement is possible with a slow network connection, since a lot | |
96 | less data is being sent. | |
97 | ||
98 | Note also that without -I rsync is already very fast, since it skips | |
99 | (most) files based on attributes. | |
100 | ||
101 | With or without this patch the test suite passes except for | |
102 | daemon-gzip-upload. One risk of buffering is the potential for | |
103 | a bug caused by a missing io_flush: deadlock is possible, so try | |
104 | the patch at your own risk... | |
105 | ||
106 | Craig | |
107 | ||
108 | ########################################################################### | |
109 | diff -bur rsync/fileio.c rsync-craig/fileio.c | |
110 | --- rsync/fileio.c Fri Jan 25 15:07:34 2002 | |
111 | +++ rsync-craig/fileio.c Sat Dec 7 22:21:10 2002 | |
112 | @@ -76,7 +76,35 @@ | |
113 | int ret = 0; | |
114 | ||
115 | if (!sparse_files) { | |
116 | - return write(f,buf,len); | |
117 | + static char *writeBuf; | |
118 | + static size_t writeBufSize; | |
119 | + static size_t writeBufCnt; | |
120 | + | |
121 | + if ( !writeBuf ) { | |
122 | + writeBufSize = MAX_MAP_SIZE; | |
123 | + writeBufCnt = 0; | |
124 | + writeBuf = (char*)malloc(MAX_MAP_SIZE); | |
125 | + if (!writeBuf) out_of_memory("write_file"); | |
126 | + } | |
127 | + ret = len; | |
128 | + do { | |
129 | + if ( buf && writeBufCnt < writeBufSize ) { | |
130 | + size_t copyLen = len; | |
131 | + if ( copyLen > writeBufSize - writeBufCnt ) { | |
132 | + copyLen = writeBufSize - writeBufCnt; | |
133 | + } | |
134 | + memcpy(writeBuf + writeBufCnt, buf, copyLen); | |
135 | + writeBufCnt += copyLen; | |
136 | + buf += copyLen; | |
137 | + len -= copyLen; | |
138 | + } | |
139 | + if ( !buf || writeBufCnt == writeBufSize ) { | |
140 | + int thisRet = write(f, writeBuf, writeBufCnt); | |
141 | + if ( thisRet < 0 ) return thisRet; | |
142 | + writeBufCnt = 0; | |
143 | + } | |
144 | + } while ( buf && len > 0 ); | |
145 | + return ret; | |
146 | } | |
147 | ||
148 | while (len>0) { | |
149 | diff -bur rsync/flist.c rsync-craig/flist.c | |
150 | --- rsync/flist.c Sat Jul 27 11:01:21 2002 | |
151 | +++ rsync-craig/flist.c Sun Dec 8 16:28:14 2002 | |
152 | @@ -889,7 +889,7 @@ | |
153 | flist = flist_new(); | |
154 | ||
155 | if (f != -1) { | |
156 | - io_start_buffering(f); | |
157 | + io_start_buffering_out(f); | |
158 | } | |
159 | ||
160 | for (i = 0; i < argc; i++) { | |
161 | diff -bur rsync/io.c rsync-craig/io.c | |
162 | --- rsync/io.c Wed Apr 10 19:11:50 2002 | |
163 | +++ rsync-craig/io.c Sun Dec 8 17:54:23 2002 | |
164 | @@ -41,8 +41,8 @@ | |
165 | ||
166 | static int io_multiplexing_out; | |
167 | static int io_multiplexing_in; | |
168 | -static int multiplex_in_fd; | |
169 | -static int multiplex_out_fd; | |
170 | +static int multiplex_in_fd = -1; | |
171 | +static int multiplex_out_fd = -1; | |
172 | static time_t last_io; | |
173 | static int no_flush; | |
174 | ||
175 | @@ -286,17 +286,31 @@ | |
176 | static size_t remaining; | |
177 | int tag, ret = 0; | |
178 | char line[1024]; | |
179 | + static char *buffer; | |
180 | + static size_t bufferIdx = 0; | |
181 | + static size_t bufferSz; | |
182 | ||
183 | - if (!io_multiplexing_in || fd != multiplex_in_fd) | |
184 | + if (fd != multiplex_in_fd) | |
185 | return read_timeout(fd, buf, len); | |
186 | ||
187 | + if (!io_multiplexing_in && remaining == 0) { | |
188 | + if (!buffer) { | |
189 | + bufferSz = 2 * IO_BUFFER_SIZE; | |
190 | + buffer = malloc(bufferSz); | |
191 | + if (!buffer) out_of_memory("read_unbuffered"); | |
192 | + } | |
193 | + remaining = read_timeout(fd, buffer, bufferSz); | |
194 | + bufferIdx = 0; | |
195 | + } | |
196 | + | |
197 | while (ret == 0) { | |
198 | if (remaining) { | |
199 | len = MIN(len, remaining); | |
200 | - read_loop(fd, buf, len); | |
201 | + memcpy(buf, buffer + bufferIdx, len); | |
202 | + bufferIdx += len; | |
203 | remaining -= len; | |
204 | ret = len; | |
205 | - continue; | |
206 | + break; | |
207 | } | |
208 | ||
209 | read_loop(fd, line, 4); | |
210 | @@ -305,8 +319,16 @@ | |
211 | remaining = tag & 0xFFFFFF; | |
212 | tag = tag >> 24; | |
213 | ||
214 | - if (tag == MPLEX_BASE) | |
215 | + if (tag == MPLEX_BASE) { | |
216 | + if (!buffer || remaining > bufferSz) { | |
217 | + buffer = Realloc(buffer, remaining); | |
218 | + if (!buffer) out_of_memory("read_unbuffered"); | |
219 | + bufferSz = remaining; | |
220 | + } | |
221 | + read_loop(fd, buffer, remaining); | |
222 | + bufferIdx = 0; | |
223 | continue; | |
224 | + } | |
225 | ||
226 | tag -= MPLEX_BASE; | |
227 | ||
228 | @@ -327,7 +349,9 @@ | |
229 | rprintf((enum logcode) tag, "%s", line); | |
230 | remaining = 0; | |
231 | } | |
232 | - | |
233 | + if (remaining == 0) { | |
234 | + io_flush(); | |
235 | + } | |
236 | return ret; | |
237 | } | |
238 | ||
239 | @@ -344,8 +368,6 @@ | |
240 | size_t total=0; | |
241 | ||
242 | while (total < N) { | |
243 | - io_flush(); | |
244 | - | |
245 | ret = read_unbuffered (fd, buffer + total, N-total); | |
246 | total += ret; | |
247 | } | |
248 | @@ -531,7 +553,7 @@ | |
249 | static char *io_buffer; | |
250 | static int io_buffer_count; | |
251 | ||
252 | -void io_start_buffering(int fd) | |
253 | +void io_start_buffering_out(int fd) | |
254 | { | |
255 | if (io_buffer) return; | |
256 | multiplex_out_fd = fd; | |
257 | @@ -540,6 +562,11 @@ | |
258 | io_buffer_count = 0; | |
259 | } | |
260 | ||
261 | +void io_start_buffering_in(int fd) | |
262 | +{ | |
263 | + multiplex_in_fd = fd; | |
264 | +} | |
265 | + | |
266 | /** | |
267 | * Write an message to a multiplexed stream. If this fails then rsync | |
268 | * exits. | |
269 | @@ -726,7 +753,7 @@ | |
270 | { | |
271 | multiplex_out_fd = fd; | |
272 | io_flush(); | |
273 | - io_start_buffering(fd); | |
274 | + io_start_buffering_out(fd); | |
275 | io_multiplexing_out = 1; | |
276 | } | |
277 | ||
278 | diff -bur rsync/main.c rsync-craig/main.c | |
279 | --- rsync/main.c Thu Aug 1 13:46:59 2002 | |
280 | +++ rsync-craig/main.c Sun Dec 8 17:39:07 2002 | |
281 | @@ -346,6 +346,8 @@ | |
282 | exit_cleanup(0); | |
283 | } | |
284 | ||
285 | + io_start_buffering_in(f_in); | |
286 | + io_start_buffering_out(f_out); | |
287 | send_files(flist,f_out,f_in); | |
288 | io_flush(); | |
289 | report(f_out); | |
290 | @@ -421,7 +423,7 @@ | |
291 | close(error_pipe[1]); | |
292 | if (f_in != f_out) close(f_in); | |
293 | ||
294 | - io_start_buffering(f_out); | |
295 | + io_start_buffering_out(f_out); | |
296 | ||
297 | io_set_error_fd(error_pipe[0]); | |
298 | ||
299 | @@ -476,6 +478,7 @@ | |
300 | } | |
301 | } | |
302 | ||
303 | + io_start_buffering_in(f_in); | |
304 | if (delete_mode && !delete_excluded) | |
305 | recv_exclude_list(f_in); | |
306 | ||
307 | @@ -569,6 +572,7 @@ | |
308 | extern int cvs_exclude; | |
309 | extern int delete_mode; | |
310 | extern int delete_excluded; | |
311 | + io_start_buffering_out(f_out); | |
312 | if (cvs_exclude) | |
313 | add_cvs_excludes(); | |
314 | if (delete_mode && !delete_excluded) | |
315 | @@ -578,7 +582,10 @@ | |
316 | if (verbose > 3) | |
317 | rprintf(FINFO,"file list sent\n"); | |
318 | ||
319 | + io_flush(); | |
320 | + io_start_buffering_out(f_out); | |
321 | send_files(flist,f_out,f_in); | |
322 | + io_flush(); | |
323 | if (remote_version >= 24) { | |
324 | /* final goodbye message */ | |
325 | read_int(f_in); | |
326 | @@ -590,6 +597,7 @@ | |
327 | wait_process(pid, &status); | |
328 | } | |
329 | report(-1); | |
330 | + io_flush(); | |
331 | exit_cleanup(status); | |
332 | } | |
333 | ||
334 | diff -bur rsync/progress.c rsync-craig/progress.c | |
335 | --- rsync/progress.c Sun Apr 7 22:28:57 2002 | |
336 | +++ rsync-craig/progress.c Sat Dec 7 18:57:19 2002 | |
337 | @@ -97,6 +97,8 @@ | |
338 | extern int do_progress, am_server; | |
339 | struct timeval now; | |
340 | ||
341 | + if (!do_progress) return; | |
342 | + | |
343 | gettimeofday(&now, NULL); | |
344 | ||
345 | if (!start_time.tv_sec && !start_time.tv_usec) { | |
346 | diff -bur rsync/proto.h rsync-craig/proto.h | |
347 | --- rsync/proto.h Wed Jul 31 17:37:02 2002 | |
348 | +++ rsync-craig/proto.h Sun Dec 8 16:27:55 2002 | |
349 | @@ -102,7 +102,8 @@ | |
350 | void read_buf(int f,char *buf,size_t len); | |
351 | void read_sbuf(int f,char *buf,size_t len); | |
352 | unsigned char read_byte(int f); | |
353 | -void io_start_buffering(int fd); | |
354 | +void io_start_buffering_out(int fd); | |
355 | +void io_start_buffering_in(int fd); | |
356 | void io_flush(void); | |
357 | void io_end_buffering(void); | |
358 | void write_int(int f,int32 x); | |
359 | diff -bur rsync/receiver.c rsync-craig/receiver.c | |
360 | --- rsync/receiver.c Tue May 28 08:42:51 2002 | |
361 | +++ rsync-craig/receiver.c Sat Dec 7 22:09:04 2002 | |
362 | @@ -273,6 +273,11 @@ | |
363 | offset += len; | |
364 | } | |
365 | ||
366 | + /* | |
367 | + * do a write flush | |
368 | + */ | |
369 | + write_file(fd, NULL, 0); | |
370 | + | |
371 | end_progress(total_size); | |
372 | ||
373 | if (fd != -1 && offset > 0 && sparse_end(fd) != 0) { |