Add Craig Barratt's performance improving patch to the contributed patches
[rsync/rsync-patches.git] / craigb-perf.diff
CommitLineData
57625988
DD
1To: rsync@lists.samba.org
2Subject: Rsync performance increase through buffering
3From: Craig Barratt <craig@atheros.com>
4Message-ID: <auto-000002369223@atheros.com>
5Date: Sun, 08 Dec 2002 23:48:57 -0800
6
7I've been studying the read and write buffering in rsync and it turns
8out most I/O is done just a couple of bytes at a time. This means there
9are lots of system calls, and also most network traffic comprises lots
10of small packets. The behavior is most extreme when sending/receiving
11file deltas of identical files.
12
13The main case where I/O is buffered is writes from the server (when
14io multiplexing is on). These are usually buffered in 4092 byte
15chunks with a 4 byte header. However, reading of these packets is
16usually unbuffered, and writes from the client are generally not
17buffered. For example: when receiving 1st phase checksums (6 bytes
18per block), 2 reads are done: one of 4 bytes and one of 2 bytes,
19meaning there are 4 system calls (select/read/select/read) per 6
20bytes of checksum data).
21
22One cost of this is some performance, but a significant issue is that
23unbuffered writes generate very short (and very many) ethernet packets,
24which means the overhead is quite large on slow network connections.
25
26The initial file_list writing is typically buffered, but reading it on
27the client is not.
28
29There are some other unneeded system calls:
30
31 - One example is that show_progress() calls gettimeofday() even
32 if do_progress is not set. show_progress() is called on every
33 block, so there is an extra system call per (700 byte) block.
34
35 - Another example is that file_write writes each matching (700 byte)
36 block without buffering, so that's another system call per block.
37
38To study this behavior I used rsync-2.5.6cvs and had a benchmark area
39comprising around 7800 files of total size 530MB.
40
41Here are some results doing sends and receives via rsyncd, all on the
42same machine, with identical source and destination files. In each
43case --ignore-times (-I) is set, so that every file is processed:
44
45 - Send test:
46
47 strace -f rsync -Ir . localhost::test |& wc
48
49 shows there are about 2,488,775 system calls.
50
51 - Receive test:
52
53 strace -f rsync -Ir localhost::test . |& wc
54
55 shows there are about 1,615,931 system calls.
56
57 - Rsyncd has a roughly similar numbers of system calls.
58
59 - Send test from another machine (cygwin/WinXP laptop):
60
61 tcpdump port 873 |& wc
62
63 shows there are about 701,111 ethernet packets (many of them only
64 have a 4 byte payload).
65
66Since the source and dest files are the same, the send test only
67wrote 1,738,797 bytes and read 2,139,848 bytes.
68
69These results are similar to rsync 2.5.5.
70
71Below is a patch to a few files that adds read and write buffering in
72the places where the I/O was unbuffered, adds buffering to write_file()
73and removes the unneeded gettimeofday() system call in show_progress().
74
75The results with the patch are:
76
77 - Send test: 46,835 system calls, versus 2,488,775.
78
79 - Receive test: 138,367 system calls, versus 1,615,931.
80
81 - Send test from another machine: 5,255 ethernet packets, versus 701,111.
82 If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that
83 means the base case transfers an extra 42MB of data, even though the
84 useful data is only around 2MB.
85
86The absolute running time on the local rsyncd test isn't much different,
87probably because the test is really disk io limited and system calls on
88an unloaded linux system are pretty fast.
89
90However, on a network test doing a send from cygwin/WinXP to rsyncd
91on rh-linux the running time improves from about 700 seconds to 215
92seconds (with a cpu load of around 17% versus 58%, if you believe
93cygwin's cpu stats). This is probably an extreme case since the system
94call penalty in cygwin is high. But I would suspect a significant
95improvement is possible with a slow network connection, since a lot
96less data is being sent.
97
98Note also that without -I rsync is already very fast, since it skips
99(most) files based on attributes.
100
101With or without this patch the test suite passes except for
102daemon-gzip-upload. One risk of buffering is the potential for
103a bug caused by a missing io_flush: deadlock is possible, so try
104the patch at your own risk...
105
106Craig
107
108###########################################################################
109diff -bur rsync/fileio.c rsync-craig/fileio.c
110--- rsync/fileio.c Fri Jan 25 15:07:34 2002
111+++ rsync-craig/fileio.c Sat Dec 7 22:21:10 2002
112@@ -76,7 +76,35 @@
113 int ret = 0;
114
115 if (!sparse_files) {
116- return write(f,buf,len);
117+ static char *writeBuf;
118+ static size_t writeBufSize;
119+ static size_t writeBufCnt;
120+
121+ if ( !writeBuf ) {
122+ writeBufSize = MAX_MAP_SIZE;
123+ writeBufCnt = 0;
124+ writeBuf = (char*)malloc(MAX_MAP_SIZE);
125+ if (!writeBuf) out_of_memory("write_file");
126+ }
127+ ret = len;
128+ do {
129+ if ( buf && writeBufCnt < writeBufSize ) {
130+ size_t copyLen = len;
131+ if ( copyLen > writeBufSize - writeBufCnt ) {
132+ copyLen = writeBufSize - writeBufCnt;
133+ }
134+ memcpy(writeBuf + writeBufCnt, buf, copyLen);
135+ writeBufCnt += copyLen;
136+ buf += copyLen;
137+ len -= copyLen;
138+ }
139+ if ( !buf || writeBufCnt == writeBufSize ) {
140+ int thisRet = write(f, writeBuf, writeBufCnt);
141+ if ( thisRet < 0 ) return thisRet;
142+ writeBufCnt = 0;
143+ }
144+ } while ( buf && len > 0 );
145+ return ret;
146 }
147
148 while (len>0) {
149diff -bur rsync/flist.c rsync-craig/flist.c
150--- rsync/flist.c Sat Jul 27 11:01:21 2002
151+++ rsync-craig/flist.c Sun Dec 8 16:28:14 2002
152@@ -889,7 +889,7 @@
153 flist = flist_new();
154
155 if (f != -1) {
156- io_start_buffering(f);
157+ io_start_buffering_out(f);
158 }
159
160 for (i = 0; i < argc; i++) {
161diff -bur rsync/io.c rsync-craig/io.c
162--- rsync/io.c Wed Apr 10 19:11:50 2002
163+++ rsync-craig/io.c Sun Dec 8 17:54:23 2002
164@@ -41,8 +41,8 @@
165
166 static int io_multiplexing_out;
167 static int io_multiplexing_in;
168-static int multiplex_in_fd;
169-static int multiplex_out_fd;
170+static int multiplex_in_fd = -1;
171+static int multiplex_out_fd = -1;
172 static time_t last_io;
173 static int no_flush;
174
175@@ -286,17 +286,31 @@
176 static size_t remaining;
177 int tag, ret = 0;
178 char line[1024];
179+ static char *buffer;
180+ static size_t bufferIdx = 0;
181+ static size_t bufferSz;
182
183- if (!io_multiplexing_in || fd != multiplex_in_fd)
184+ if (fd != multiplex_in_fd)
185 return read_timeout(fd, buf, len);
186
187+ if (!io_multiplexing_in && remaining == 0) {
188+ if (!buffer) {
189+ bufferSz = 2 * IO_BUFFER_SIZE;
190+ buffer = malloc(bufferSz);
191+ if (!buffer) out_of_memory("read_unbuffered");
192+ }
193+ remaining = read_timeout(fd, buffer, bufferSz);
194+ bufferIdx = 0;
195+ }
196+
197 while (ret == 0) {
198 if (remaining) {
199 len = MIN(len, remaining);
200- read_loop(fd, buf, len);
201+ memcpy(buf, buffer + bufferIdx, len);
202+ bufferIdx += len;
203 remaining -= len;
204 ret = len;
205- continue;
206+ break;
207 }
208
209 read_loop(fd, line, 4);
210@@ -305,8 +319,16 @@
211 remaining = tag & 0xFFFFFF;
212 tag = tag >> 24;
213
214- if (tag == MPLEX_BASE)
215+ if (tag == MPLEX_BASE) {
216+ if (!buffer || remaining > bufferSz) {
217+ buffer = Realloc(buffer, remaining);
218+ if (!buffer) out_of_memory("read_unbuffered");
219+ bufferSz = remaining;
220+ }
221+ read_loop(fd, buffer, remaining);
222+ bufferIdx = 0;
223 continue;
224+ }
225
226 tag -= MPLEX_BASE;
227
228@@ -327,7 +349,9 @@
229 rprintf((enum logcode) tag, "%s", line);
230 remaining = 0;
231 }
232-
233+ if (remaining == 0) {
234+ io_flush();
235+ }
236 return ret;
237 }
238
239@@ -344,8 +368,6 @@
240 size_t total=0;
241
242 while (total < N) {
243- io_flush();
244-
245 ret = read_unbuffered (fd, buffer + total, N-total);
246 total += ret;
247 }
248@@ -531,7 +553,7 @@
249 static char *io_buffer;
250 static int io_buffer_count;
251
252-void io_start_buffering(int fd)
253+void io_start_buffering_out(int fd)
254 {
255 if (io_buffer) return;
256 multiplex_out_fd = fd;
257@@ -540,6 +562,11 @@
258 io_buffer_count = 0;
259 }
260
261+void io_start_buffering_in(int fd)
262+{
263+ multiplex_in_fd = fd;
264+}
265+
266 /**
267 * Write an message to a multiplexed stream. If this fails then rsync
268 * exits.
269@@ -726,7 +753,7 @@
270 {
271 multiplex_out_fd = fd;
272 io_flush();
273- io_start_buffering(fd);
274+ io_start_buffering_out(fd);
275 io_multiplexing_out = 1;
276 }
277
278diff -bur rsync/main.c rsync-craig/main.c
279--- rsync/main.c Thu Aug 1 13:46:59 2002
280+++ rsync-craig/main.c Sun Dec 8 17:39:07 2002
281@@ -346,6 +346,8 @@
282 exit_cleanup(0);
283 }
284
285+ io_start_buffering_in(f_in);
286+ io_start_buffering_out(f_out);
287 send_files(flist,f_out,f_in);
288 io_flush();
289 report(f_out);
290@@ -421,7 +423,7 @@
291 close(error_pipe[1]);
292 if (f_in != f_out) close(f_in);
293
294- io_start_buffering(f_out);
295+ io_start_buffering_out(f_out);
296
297 io_set_error_fd(error_pipe[0]);
298
299@@ -476,6 +478,7 @@
300 }
301 }
302
303+ io_start_buffering_in(f_in);
304 if (delete_mode && !delete_excluded)
305 recv_exclude_list(f_in);
306
307@@ -569,6 +572,7 @@
308 extern int cvs_exclude;
309 extern int delete_mode;
310 extern int delete_excluded;
311+ io_start_buffering_out(f_out);
312 if (cvs_exclude)
313 add_cvs_excludes();
314 if (delete_mode && !delete_excluded)
315@@ -578,7 +582,10 @@
316 if (verbose > 3)
317 rprintf(FINFO,"file list sent\n");
318
319+ io_flush();
320+ io_start_buffering_out(f_out);
321 send_files(flist,f_out,f_in);
322+ io_flush();
323 if (remote_version >= 24) {
324 /* final goodbye message */
325 read_int(f_in);
326@@ -590,6 +597,7 @@
327 wait_process(pid, &status);
328 }
329 report(-1);
330+ io_flush();
331 exit_cleanup(status);
332 }
333
334diff -bur rsync/progress.c rsync-craig/progress.c
335--- rsync/progress.c Sun Apr 7 22:28:57 2002
336+++ rsync-craig/progress.c Sat Dec 7 18:57:19 2002
337@@ -97,6 +97,8 @@
338 extern int do_progress, am_server;
339 struct timeval now;
340
341+ if (!do_progress) return;
342+
343 gettimeofday(&now, NULL);
344
345 if (!start_time.tv_sec && !start_time.tv_usec) {
346diff -bur rsync/proto.h rsync-craig/proto.h
347--- rsync/proto.h Wed Jul 31 17:37:02 2002
348+++ rsync-craig/proto.h Sun Dec 8 16:27:55 2002
349@@ -102,7 +102,8 @@
350 void read_buf(int f,char *buf,size_t len);
351 void read_sbuf(int f,char *buf,size_t len);
352 unsigned char read_byte(int f);
353-void io_start_buffering(int fd);
354+void io_start_buffering_out(int fd);
355+void io_start_buffering_in(int fd);
356 void io_flush(void);
357 void io_end_buffering(void);
358 void write_int(int f,int32 x);
359diff -bur rsync/receiver.c rsync-craig/receiver.c
360--- rsync/receiver.c Tue May 28 08:42:51 2002
361+++ rsync-craig/receiver.c Sat Dec 7 22:09:04 2002
362@@ -273,6 +273,11 @@
363 offset += len;
364 }
365
366+ /*
367+ * do a write flush
368+ */
369+ write_file(fd, NULL, 0);
370+
371 end_progress(total_size);
372
373 if (fd != -1 && offset > 0 && sparse_end(fd) != 0) {