Updated patch for the 2.6.0 release.
[rsync/rsync-patches.git] / craigb-perf.diff
CommitLineData
57625988
DD
1To: rsync@lists.samba.org
2Subject: Rsync performance increase through buffering
3From: Craig Barratt <craig@atheros.com>
4Message-ID: <auto-000002369223@atheros.com>
5Date: Sun, 08 Dec 2002 23:48:57 -0800
6
7I've been studying the read and write buffering in rsync and it turns
8out most I/O is done just a couple of bytes at a time. This means there
9are lots of system calls, and also most network traffic comprises lots
10of small packets. The behavior is most extreme when sending/receiving
11file deltas of identical files.
12
13The main case where I/O is buffered is writes from the server (when
14io multiplexing is on). These are usually buffered in 4092 byte
15chunks with a 4 byte header. However, reading of these packets is
16usually unbuffered, and writes from the client are generally not
17buffered. For example: when receiving 1st phase checksums (6 bytes
18per block), 2 reads are done: one of 4 bytes and one of 2 bytes,
19meaning there are 4 system calls (select/read/select/read) per 6
20bytes of checksum data).
21
22One cost of this is some performance, but a significant issue is that
23unbuffered writes generate very short (and very many) ethernet packets,
24which means the overhead is quite large on slow network connections.
25
26The initial file_list writing is typically buffered, but reading it on
27the client is not.
28
29There are some other unneeded system calls:
30
31 - One example is that show_progress() calls gettimeofday() even
32 if do_progress is not set. show_progress() is called on every
33 block, so there is an extra system call per (700 byte) block.
184753c5
WD
34 (NOTE: this was already fixed in 2.6.0, so it is no longer in
35 this patch.)
57625988
DD
36
37 - Another example is that file_write writes each matching (700 byte)
38 block without buffering, so that's another system call per block.
39
40To study this behavior I used rsync-2.5.6cvs and had a benchmark area
41comprising around 7800 files of total size 530MB.
42
43Here are some results doing sends and receives via rsyncd, all on the
44same machine, with identical source and destination files. In each
45case --ignore-times (-I) is set, so that every file is processed:
46
47 - Send test:
48
49 strace -f rsync -Ir . localhost::test |& wc
50
51 shows there are about 2,488,775 system calls.
52
53 - Receive test:
54
55 strace -f rsync -Ir localhost::test . |& wc
56
57 shows there are about 1,615,931 system calls.
58
59 - Rsyncd has a roughly similar numbers of system calls.
60
61 - Send test from another machine (cygwin/WinXP laptop):
62
63 tcpdump port 873 |& wc
64
65 shows there are about 701,111 ethernet packets (many of them only
66 have a 4 byte payload).
67
68Since the source and dest files are the same, the send test only
69wrote 1,738,797 bytes and read 2,139,848 bytes.
70
71These results are similar to rsync 2.5.5.
72
73Below is a patch to a few files that adds read and write buffering in
74the places where the I/O was unbuffered, adds buffering to write_file()
75and removes the unneeded gettimeofday() system call in show_progress().
76
77The results with the patch are:
78
79 - Send test: 46,835 system calls, versus 2,488,775.
80
81 - Receive test: 138,367 system calls, versus 1,615,931.
82
83 - Send test from another machine: 5,255 ethernet packets, versus 701,111.
84 If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that
85 means the base case transfers an extra 42MB of data, even though the
86 useful data is only around 2MB.
87
88The absolute running time on the local rsyncd test isn't much different,
89probably because the test is really disk io limited and system calls on
90an unloaded linux system are pretty fast.
91
92However, on a network test doing a send from cygwin/WinXP to rsyncd
93on rh-linux the running time improves from about 700 seconds to 215
94seconds (with a cpu load of around 17% versus 58%, if you believe
95cygwin's cpu stats). This is probably an extreme case since the system
96call penalty in cygwin is high. But I would suspect a significant
97improvement is possible with a slow network connection, since a lot
98less data is being sent.
99
100Note also that without -I rsync is already very fast, since it skips
101(most) files based on attributes.
102
103With or without this patch the test suite passes except for
104daemon-gzip-upload. One risk of buffering is the potential for
105a bug caused by a missing io_flush: deadlock is possible, so try
106the patch at your own risk...
107
108Craig
109
110###########################################################################
184753c5
WD
111Note: This is an updated patch for 2.6.0, created by Wayne Davison.
112###########################################################################
113--- orig/fileio.c 6 Dec 2003 21:07:27 -0000 1.8
114+++ rsync-2.6.0/fileio.c 16 Dec 2003 19:12:29 -0000
115@@ -69,6 +69,21 @@
116 return len;
117 }
57625988 118
57625988 119+
184753c5
WD
120+static char *wf_writeBuf;
121+static size_t wf_writeBufSize;
122+static size_t wf_writeBufCnt;
123+
124+int flush_write_file(int f)
125+{
126+ int ret = write(f, wf_writeBuf, wf_writeBufCnt);
127+ if (ret < 0)
57625988 128+ return ret;
184753c5
WD
129+ /* if (ret < wf_writeBufCnt) ??? */
130+ wf_writeBufCnt = 0;
131+ return ret;
132+}
133+
134 /*
135 * write_file does not allow incomplete writes. It loops internally
136 * until len bytes are written or errno is set.
137@@ -83,7 +98,22 @@
138 int len1 = MIN(len, SPARSE_WRITE_SIZE);
139 r1 = write_sparse(f, buf, len1);
140 } else {
141- r1 = write(f, buf, len);
142+ if (!wf_writeBuf) {
143+ wf_writeBufSize = MAX_MAP_SIZE;
144+ wf_writeBufCnt = 0;
145+ wf_writeBuf = new_array(char, MAX_MAP_SIZE);
146+ if (!wf_writeBuf) out_of_memory("write_file");
147+ }
148+ r1 = MIN(len, wf_writeBufSize - wf_writeBufCnt);
149+ if (r1) {
150+ memcpy(wf_writeBuf + wf_writeBufCnt, buf, r1);
151+ wf_writeBufCnt += r1;
152+ }
153+ if (wf_writeBufCnt == wf_writeBufSize) {
154+ flush_write_file(f);
155+ if (!r1 && len)
156+ continue;
157+ }
158 }
159 if (r1 <= 0) {
160 if (ret > 0) return ret;
161--- orig/flist.c 15 Dec 2003 08:10:31 -0000 1.144
162+++ rsync-2.6.0/rsync-2.6.0/flist.c 16 Dec 2003 19:12:30 -0000
163@@ -925,7 +925,7 @@
57625988
DD
164 flist = flist_new();
165
166 if (f != -1) {
167- io_start_buffering(f);
168+ io_start_buffering_out(f);
184753c5
WD
169 if (filesfrom_fd >= 0) {
170 if (argv[0] && !push_dir(argv[0], 0)) {
171 rprintf(FERROR, "push_dir %s failed: %s\n",
172--- orig/io.c 15 Dec 2003 00:54:44 -0000 1.112
173+++ rsync-2.6.0/rsync-2.6.0/io.c 16 Dec 2003 19:12:30 -0000
57625988
DD
174@@ -41,8 +41,8 @@
175
176 static int io_multiplexing_out;
177 static int io_multiplexing_in;
178-static int multiplex_in_fd;
179-static int multiplex_out_fd;
180+static int multiplex_in_fd = -1;
181+static int multiplex_out_fd = -1;
182 static time_t last_io;
183 static int no_flush;
184
184753c5 185@@ -441,17 +441,31 @@
57625988
DD
186 static size_t remaining;
187 int tag, ret = 0;
188 char line[1024];
184753c5
WD
189+ static char *buffer;
190+ static size_t bufferIdx = 0;
191+ static size_t bufferSz;
57625988
DD
192
193- if (!io_multiplexing_in || fd != multiplex_in_fd)
194+ if (fd != multiplex_in_fd)
195 return read_timeout(fd, buf, len);
196
197+ if (!io_multiplexing_in && remaining == 0) {
198+ if (!buffer) {
199+ bufferSz = 2 * IO_BUFFER_SIZE;
184753c5 200+ buffer = new_array(char, bufferSz);
57625988
DD
201+ if (!buffer) out_of_memory("read_unbuffered");
202+ }
203+ remaining = read_timeout(fd, buffer, bufferSz);
184753c5
WD
204+ bufferIdx = 0;
205+ }
57625988
DD
206+
207 while (ret == 0) {
208 if (remaining) {
209 len = MIN(len, remaining);
210- read_loop(fd, buf, len);
211+ memcpy(buf, buffer + bufferIdx, len);
212+ bufferIdx += len;
213 remaining -= len;
214 ret = len;
215- continue;
216+ break;
217 }
218
219 read_loop(fd, line, 4);
184753c5 220@@ -460,8 +474,16 @@
57625988
DD
221 remaining = tag & 0xFFFFFF;
222 tag = tag >> 24;
223
224- if (tag == MPLEX_BASE)
225+ if (tag == MPLEX_BASE) {
226+ if (!buffer || remaining > bufferSz) {
184753c5 227+ buffer = realloc_array(buffer, char, remaining);
57625988
DD
228+ if (!buffer) out_of_memory("read_unbuffered");
229+ bufferSz = remaining;
230+ }
231+ read_loop(fd, buffer, remaining);
184753c5 232+ bufferIdx = 0;
57625988 233 continue;
184753c5 234+ }
57625988
DD
235
236 tag -= MPLEX_BASE;
237
184753c5 238@@ -483,6 +505,9 @@
57625988
DD
239 remaining = 0;
240 }
184753c5
WD
241
242+ if (remaining == 0)
57625988 243+ io_flush();
184753c5 244+
57625988
DD
245 return ret;
246 }
247
184753c5 248@@ -499,8 +524,6 @@
57625988 249 size_t total=0;
184753c5 250
57625988
DD
251 while (total < N) {
252- io_flush();
253-
184753c5 254 ret = read_unbuffered(fd, buffer + total, N-total);
57625988
DD
255 total += ret;
256 }
184753c5 257@@ -683,7 +706,7 @@
57625988
DD
258 static char *io_buffer;
259 static int io_buffer_count;
260
261-void io_start_buffering(int fd)
262+void io_start_buffering_out(int fd)
263 {
264 if (io_buffer) return;
265 multiplex_out_fd = fd;
184753c5 266@@ -692,6 +715,11 @@
57625988
DD
267 io_buffer_count = 0;
268 }
269
270+void io_start_buffering_in(int fd)
271+{
272+ multiplex_in_fd = fd;
273+}
274+
275 /**
276 * Write an message to a multiplexed stream. If this fails then rsync
277 * exits.
184753c5 278@@ -882,7 +910,7 @@
57625988
DD
279 {
280 multiplex_out_fd = fd;
281 io_flush();
282- io_start_buffering(fd);
283+ io_start_buffering_out(fd);
284 io_multiplexing_out = 1;
285 }
286
184753c5
WD
287--- orig/main.c 16 Dec 2003 18:02:03 -0000 1.173
288+++ rsync-2.6.0/main.c 16 Dec 2003 19:12:30 -0000
289@@ -380,6 +380,8 @@
57625988
DD
290 exit_cleanup(0);
291 }
292
184753c5
WD
293+ io_start_buffering_in(f_in);
294+ io_start_buffering_out(f_out);
57625988
DD
295 send_files(flist,f_out,f_in);
296 io_flush();
297 report(f_out);
184753c5 298@@ -454,7 +456,7 @@
57625988
DD
299 close(error_pipe[1]);
300 if (f_in != f_out) close(f_in);
301
302- io_start_buffering(f_out);
303+ io_start_buffering_out(f_out);
304
305 io_set_error_fd(error_pipe[0]);
306
184753c5
WD
307@@ -508,6 +510,7 @@
308 }
57625988
DD
309 }
310
184753c5 311+ io_start_buffering_in(f_in);
57625988
DD
312 if (delete_mode && !delete_excluded)
313 recv_exclude_list(f_in);
314
184753c5 315@@ -606,6 +609,7 @@
57625988
DD
316 extern int cvs_exclude;
317 extern int delete_mode;
318 extern int delete_excluded;
184753c5 319+ io_start_buffering_out(f_out);
57625988
DD
320 if (cvs_exclude)
321 add_cvs_excludes();
184753c5
WD
322 if (delete_mode && !delete_excluded)
323@@ -617,7 +621,10 @@
324 if (verbose > 3)
57625988
DD
325 rprintf(FINFO,"file list sent\n");
326
184753c5
WD
327+ io_flush();
328+ io_start_buffering_out(f_out);
57625988 329 send_files(flist,f_out,f_in);
184753c5
WD
330+ io_flush();
331 if (protocol_version >= 24) {
332 /* final goodbye message */
57625988 333 read_int(f_in);
184753c5 334@@ -629,6 +636,7 @@
57625988
DD
335 wait_process(pid, &status);
336 }
337 report(-1);
338+ io_flush();
339 exit_cleanup(status);
340 }
341
184753c5
WD
342--- orig/proto.h 6 Dec 2003 21:07:27 -0000 1.161
343+++ rsync-2.6.0/proto.h 16 Dec 2003 19:12:30 -0000
344@@ -68,6 +68,7 @@
345 const char *line, int include);
346 void add_cvs_excludes(void);
347 int sparse_end(int f);
348+int flush_write_file(int f);
349 int write_file(int f,char *buf,size_t len);
350 struct map_struct *map_file(int fd,OFF_T len);
351 char *map_ptr(struct map_struct *map,OFF_T offset,int len);
352@@ -101,7 +102,8 @@
57625988
DD
353 void read_buf(int f,char *buf,size_t len);
354 void read_sbuf(int f,char *buf,size_t len);
355 unsigned char read_byte(int f);
356-void io_start_buffering(int fd);
357+void io_start_buffering_out(int fd);
358+void io_start_buffering_in(int fd);
359 void io_flush(void);
360 void io_end_buffering(void);
361 void write_int(int f,int32 x);
184753c5
WD
362--- orig/receiver.c 15 Dec 2003 08:14:27 -0000 1.56
363+++ rsync-2.6.0/receiver.c 16 Dec 2003 19:12:30 -0000
364@@ -307,6 +307,8 @@
57625988
DD
365 offset += len;
366 }
367
184753c5 368+ flush_write_file(fd);
57625988 369+
184753c5
WD
370 if (do_progress)
371 end_progress(total_size);
57625988 372