To: rsync@lists.samba.org Subject: Rsync performance increase through buffering From: Craig Barratt Message-ID: Date: Sun, 08 Dec 2002 23:48:57 -0800 I've been studying the read and write buffering in rsync and it turns out most I/O is done just a couple of bytes at a time. This means there are lots of system calls, and also most network traffic comprises lots of small packets. The behavior is most extreme when sending/receiving file deltas of identical files. The main case where I/O is buffered is writes from the server (when io multiplexing is on). These are usually buffered in 4092 byte chunks with a 4 byte header. However, reading of these packets is usually unbuffered, and writes from the client are generally not buffered. For example: when receiving 1st phase checksums (6 bytes per block), 2 reads are done: one of 4 bytes and one of 2 bytes, meaning there are 4 system calls (select/read/select/read) per 6 bytes of checksum data). One cost of this is some performance, but a significant issue is that unbuffered writes generate very short (and very many) ethernet packets, which means the overhead is quite large on slow network connections. The initial file_list writing is typically buffered, but reading it on the client is not. There are some other unneeded system calls: - One example is that show_progress() calls gettimeofday() even if do_progress is not set. show_progress() is called on every block, so there is an extra system call per (700 byte) block. (NOTE: this was already fixed in 2.6.0, so it is no longer in this patch.) - Another example is that file_write writes each matching (700 byte) block without buffering, so that's another system call per block. To study this behavior I used rsync-2.5.6cvs and had a benchmark area comprising around 7800 files of total size 530MB. Here are some results doing sends and receives via rsyncd, all on the same machine, with identical source and destination files. In each case --ignore-times (-I) is set, so that every file is processed: - Send test: strace -f rsync -Ir . localhost::test |& wc shows there are about 2,488,775 system calls. - Receive test: strace -f rsync -Ir localhost::test . |& wc shows there are about 1,615,931 system calls. - Rsyncd has a roughly similar numbers of system calls. - Send test from another machine (cygwin/WinXP laptop): tcpdump port 873 |& wc shows there are about 701,111 ethernet packets (many of them only have a 4 byte payload). Since the source and dest files are the same, the send test only wrote 1,738,797 bytes and read 2,139,848 bytes. These results are similar to rsync 2.5.5. Below is a patch to a few files that adds read and write buffering in the places where the I/O was unbuffered, adds buffering to write_file() and removes the unneeded gettimeofday() system call in show_progress(). The results with the patch are: - Send test: 46,835 system calls, versus 2,488,775. - Receive test: 138,367 system calls, versus 1,615,931. - Send test from another machine: 5,255 ethernet packets, versus 701,111. If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that means the base case transfers an extra 42MB of data, even though the useful data is only around 2MB. The absolute running time on the local rsyncd test isn't much different, probably because the test is really disk io limited and system calls on an unloaded linux system are pretty fast. However, on a network test doing a send from cygwin/WinXP to rsyncd on rh-linux the running time improves from about 700 seconds to 215 seconds (with a cpu load of around 17% versus 58%, if you believe cygwin's cpu stats). This is probably an extreme case since the system call penalty in cygwin is high. But I would suspect a significant improvement is possible with a slow network connection, since a lot less data is being sent. Note also that without -I rsync is already very fast, since it skips (most) files based on attributes. With or without this patch the test suite passes except for daemon-gzip-upload. One risk of buffering is the potential for a bug caused by a missing io_flush: deadlock is possible, so try the patch at your own risk... Craig ########################################################################### Note: This is an updated patch for 2.6.0, created by Wayne Davison. ########################################################################### --- orig/fileio.c 6 Dec 2003 21:07:27 -0000 1.8 +++ rsync-2.6.0/fileio.c 16 Dec 2003 19:12:29 -0000 @@ -69,6 +69,21 @@ return len; } + +static char *wf_writeBuf; +static size_t wf_writeBufSize; +static size_t wf_writeBufCnt; + +int flush_write_file(int f) +{ + int ret = write(f, wf_writeBuf, wf_writeBufCnt); + if (ret < 0) + return ret; + /* if (ret < wf_writeBufCnt) ??? */ + wf_writeBufCnt = 0; + return ret; +} + /* * write_file does not allow incomplete writes. It loops internally * until len bytes are written or errno is set. @@ -83,7 +98,22 @@ int len1 = MIN(len, SPARSE_WRITE_SIZE); r1 = write_sparse(f, buf, len1); } else { - r1 = write(f, buf, len); + if (!wf_writeBuf) { + wf_writeBufSize = MAX_MAP_SIZE; + wf_writeBufCnt = 0; + wf_writeBuf = new_array(char, MAX_MAP_SIZE); + if (!wf_writeBuf) out_of_memory("write_file"); + } + r1 = MIN(len, wf_writeBufSize - wf_writeBufCnt); + if (r1) { + memcpy(wf_writeBuf + wf_writeBufCnt, buf, r1); + wf_writeBufCnt += r1; + } + if (wf_writeBufCnt == wf_writeBufSize) { + flush_write_file(f); + if (!r1 && len) + continue; + } } if (r1 <= 0) { if (ret > 0) return ret; --- orig/flist.c 15 Dec 2003 08:10:31 -0000 1.144 +++ rsync-2.6.0/rsync-2.6.0/flist.c 16 Dec 2003 19:12:30 -0000 @@ -925,7 +925,7 @@ flist = flist_new(); if (f != -1) { - io_start_buffering(f); + io_start_buffering_out(f); if (filesfrom_fd >= 0) { if (argv[0] && !push_dir(argv[0], 0)) { rprintf(FERROR, "push_dir %s failed: %s\n", --- orig/io.c 15 Dec 2003 00:54:44 -0000 1.112 +++ rsync-2.6.0/rsync-2.6.0/io.c 16 Dec 2003 19:12:30 -0000 @@ -41,8 +41,8 @@ static int io_multiplexing_out; static int io_multiplexing_in; -static int multiplex_in_fd; -static int multiplex_out_fd; +static int multiplex_in_fd = -1; +static int multiplex_out_fd = -1; static time_t last_io; static int no_flush; @@ -441,17 +441,31 @@ static size_t remaining; int tag, ret = 0; char line[1024]; + static char *buffer; + static size_t bufferIdx = 0; + static size_t bufferSz; - if (!io_multiplexing_in || fd != multiplex_in_fd) + if (fd != multiplex_in_fd) return read_timeout(fd, buf, len); + if (!io_multiplexing_in && remaining == 0) { + if (!buffer) { + bufferSz = 2 * IO_BUFFER_SIZE; + buffer = new_array(char, bufferSz); + if (!buffer) out_of_memory("read_unbuffered"); + } + remaining = read_timeout(fd, buffer, bufferSz); + bufferIdx = 0; + } + while (ret == 0) { if (remaining) { len = MIN(len, remaining); - read_loop(fd, buf, len); + memcpy(buf, buffer + bufferIdx, len); + bufferIdx += len; remaining -= len; ret = len; - continue; + break; } read_loop(fd, line, 4); @@ -460,8 +474,16 @@ remaining = tag & 0xFFFFFF; tag = tag >> 24; - if (tag == MPLEX_BASE) + if (tag == MPLEX_BASE) { + if (!buffer || remaining > bufferSz) { + buffer = realloc_array(buffer, char, remaining); + if (!buffer) out_of_memory("read_unbuffered"); + bufferSz = remaining; + } + read_loop(fd, buffer, remaining); + bufferIdx = 0; continue; + } tag -= MPLEX_BASE; @@ -483,6 +505,9 @@ remaining = 0; } + if (remaining == 0) + io_flush(); + return ret; } @@ -499,8 +524,6 @@ size_t total=0; while (total < N) { - io_flush(); - ret = read_unbuffered(fd, buffer + total, N-total); total += ret; } @@ -683,7 +706,7 @@ static char *io_buffer; static int io_buffer_count; -void io_start_buffering(int fd) +void io_start_buffering_out(int fd) { if (io_buffer) return; multiplex_out_fd = fd; @@ -692,6 +715,11 @@ io_buffer_count = 0; } +void io_start_buffering_in(int fd) +{ + multiplex_in_fd = fd; +} + /** * Write an message to a multiplexed stream. If this fails then rsync * exits. @@ -882,7 +910,7 @@ { multiplex_out_fd = fd; io_flush(); - io_start_buffering(fd); + io_start_buffering_out(fd); io_multiplexing_out = 1; } --- orig/main.c 16 Dec 2003 18:02:03 -0000 1.173 +++ rsync-2.6.0/main.c 16 Dec 2003 19:12:30 -0000 @@ -380,6 +380,8 @@ exit_cleanup(0); } + io_start_buffering_in(f_in); + io_start_buffering_out(f_out); send_files(flist,f_out,f_in); io_flush(); report(f_out); @@ -454,7 +456,7 @@ close(error_pipe[1]); if (f_in != f_out) close(f_in); - io_start_buffering(f_out); + io_start_buffering_out(f_out); io_set_error_fd(error_pipe[0]); @@ -508,6 +510,7 @@ } } + io_start_buffering_in(f_in); if (delete_mode && !delete_excluded) recv_exclude_list(f_in); @@ -606,6 +609,7 @@ extern int cvs_exclude; extern int delete_mode; extern int delete_excluded; + io_start_buffering_out(f_out); if (cvs_exclude) add_cvs_excludes(); if (delete_mode && !delete_excluded) @@ -617,7 +621,10 @@ if (verbose > 3) rprintf(FINFO,"file list sent\n"); + io_flush(); + io_start_buffering_out(f_out); send_files(flist,f_out,f_in); + io_flush(); if (protocol_version >= 24) { /* final goodbye message */ read_int(f_in); @@ -629,6 +636,7 @@ wait_process(pid, &status); } report(-1); + io_flush(); exit_cleanup(status); } --- orig/proto.h 6 Dec 2003 21:07:27 -0000 1.161 +++ rsync-2.6.0/proto.h 16 Dec 2003 19:12:30 -0000 @@ -68,6 +68,7 @@ const char *line, int include); void add_cvs_excludes(void); int sparse_end(int f); +int flush_write_file(int f); int write_file(int f,char *buf,size_t len); struct map_struct *map_file(int fd,OFF_T len); char *map_ptr(struct map_struct *map,OFF_T offset,int len); @@ -101,7 +102,8 @@ void read_buf(int f,char *buf,size_t len); void read_sbuf(int f,char *buf,size_t len); unsigned char read_byte(int f); -void io_start_buffering(int fd); +void io_start_buffering_out(int fd); +void io_start_buffering_in(int fd); void io_flush(void); void io_end_buffering(void); void write_int(int f,int32 x); --- orig/receiver.c 15 Dec 2003 08:14:27 -0000 1.56 +++ rsync-2.6.0/receiver.c 16 Dec 2003 19:12:30 -0000 @@ -307,6 +307,8 @@ offset += len; } + flush_write_file(fd); + if (do_progress) end_progress(total_size);