3 BUGS ---------------------------------------------------------------
5 There seems to be a bug with hardlinks
7 mbp/2 build$ ls -l /tmp/a /tmp/b -i
10 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
11 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
12 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
13 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
14 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
15 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
16 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
17 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
21 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
22 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
23 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
24 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
25 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
26 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
27 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
28 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
29 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
30 building file list ... done
31 created directory /tmp/b
37 wrote 350 bytes read 52 bytes 804.00 bytes/sec
38 total size is 232 speedup is 0.58
39 mbp/2 build$ rm -r /tmp/b
40 mbp/2 build$ ls -l /tmp/b
41 ls: /tmp/b: No such file or directory
42 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
43 rm: cannot remove `/tmp/b': No such file or directory
44 mbp/2 build$ rm -f -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
45 building file list ... done
46 created directory /tmp/b
52 wrote 350 bytes read 52 bytes 804.00 bytes/sec
53 total size is 232 speedup is 0.58
54 mbp/2 build$ ls -l /tmp/b
56 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
57 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
58 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
59 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
60 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
61 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
62 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
63 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
64 mbp/2 build$ ls -l /tmp/a
66 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
67 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
68 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
69 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
70 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
71 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
72 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
73 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
76 Progress indicator can produce corrupt output when transferring directories:
79 main/binary-arm/admin/
81 main/binary-arm/comm/8.56kB/s 0:00:52
82 main/binary-arm/devel/
84 main/binary-arm/editors/
85 main/binary-arm/electronics/s 0:00:53
86 main/binary-arm/games/
87 main/binary-arm/graphics/
88 main/binary-arm/hamradio/
89 main/binary-arm/interpreters/
90 main/binary-arm/libs/6.61kB/s 0:00:54
96 I don't think we handle this properly on systems that don't have the
100 Part of the regression suite should be making sure that we don't
101 break backwards compatibility: old clients vs new servers and so
102 on. Ideally we would test the cross product of versions.
104 It might be sufficient to test downloads from well-known public
105 rsync servers running different versions of rsync. This will give
106 some testing and also be the most common case for having different
107 versions and not being able to upgrade.
109 --no-blocking-io might be broken
111 in the same way as --no-whole-file; somebody needs to check.
114 DAEMON --------------------------------------------------------------
116 server-imposed bandwidth limits
120 There are already some patches to do this.
122 BitKeeper uses a server whose login shell is set to bkd. That's
123 probably a reasonable approach.
126 FEATURES ------------------------------------------------------------
129 --dry-run is insufficiently dry
131 Mark Santcroos points out that -n fails to list files which have
132 only metadata changes, though it probably should.
134 There may be a Debian bug about this as well.
139 If the platform doesn't support it, then don't even try.
141 If running as non-root, then don't fail, just give a warning.
142 (There was a thread about this a while ago?)
144 http://lists.samba.org/pipermail/rsync/2001-August/thread.html
145 http://lists.samba.org/pipermail/rsync/2001-September/thread.html
150 Avoids traversal. Better option than a pile of --include statements
151 for people who want to generate the file list using a find(1)
157 Perhaps allow supplementary groups to be specified in rsyncd.conf;
158 then make the first one the primary gid and all the rest be
162 File list structure in memory
164 Rather than one big array, perhaps have a tree in memory mirroring
167 This might make sorting much faster! (I'm not sure it's a big CPU
170 It might also reduce memory use in storing repeated directory names
171 -- again I'm not sure this is a problem.
175 Traverse just one directory at a time. Tridge says it's possible.
177 At the moment rsync reads the whole file list into memory at the
178 start, which makes us use a lot of memory and also not pipeline
179 network access as much as we could.
182 Handling duplicate names
184 We need to be careful of duplicate names getting into the file list.
185 See clean_flist(). This could happen if multiple arguments include
188 I think duplicates are only a problem if they're both flowing
189 through the pipeline at the same time. For example we might have
190 updated the first occurrence after reading the checksums for the
191 second. So possibly we just need to make sure that we don't have
192 both in the pipeline at the same time.
194 Possibly if we did one directory at a time that would be sufficient.
196 Alternatively we could pre-process the arguments to make sure no
197 duplicates will ever be inserted. There could be some bad cases
198 when we're collapsing symlinks.
200 We could have a hash table.
202 The root of the problem is that we do not want more than one file
203 list entry referring to the same file. At first glance there are
204 several ways this could happen: symlinks, hardlinks, and repeated
205 names on the command line.
207 If names are repeated on the command line, they may be present in
208 different forms, perhaps by traversing directory paths in different
209 ways, traversing paths including symlinks. Also we need to allow
210 for expansion of globs by rsync.
212 At the moment, clean_flist() requires having the entire file list in
213 memory. Duplicate names are detected just by a string comparison.
215 We don't need to worry about hard links causing duplicates because
216 files are never updated in place. Similarly for symlinks.
218 I think even if we're using a different symlink mode we don't need
221 Unless we're really clever this will introduce a protocol
222 incompatibility, so we need to be able to accept the old format as
228 At exit, show how much memory was used for the file list, etc.
230 Also we do a wierd exponential-growth allocation in flist.c. I'm
231 not sure this makes sense with modern mallocs. At any rate it will
232 make us allocate a huge amount of memory for large file lists.
237 At the moment hardlink handling is very expensive, so it's off by
238 default. It does not need to be so.
240 Since most of the solutions are rather intertwined with the file
241 list it is probably better to fix that first, although fixing
242 hardlinks is possibly simpler.
244 We can rule out hardlinked directories since they will probably
245 screw us up in all kinds of ways. They simply should not be used.
247 At the moment rsync only cares about hardlinks to regular files. I
248 guess you could also use them for sockets, devices and other beasts,
249 but I have not seen them.
251 When trying to reproduce hard links, we only need to worry about
252 files that have more than one name (nlinks>1 && !S_ISDIR).
254 The basic point of this is to discover alternate names that refer to
255 the same file. All operations, including creating the file and
256 writing modifications to it need only to be done for the first name.
257 For all later names, we just create the link and then leave it
260 If hard links are to be preserved:
262 Before the generator/receiver fork, the list of files is received
263 from the sender (recv_file_list), and a table for detecting hard
266 The generator looks for hard links within the file list and does
267 not send checksums for them, though it does send other metadata.
269 The sender sends the device number and inode with file entries, so
270 that files are uniquely identified.
272 The receiver goes through and creates hard links (do_hard_links)
273 after all data has been written, but before directory permissions
276 At the moment device and inum are sent as 4-byte integers, which
277 will probably cause problems on large filesystems. On Linux the
278 kernel uses 64-bit ino_t's internally, and people will soon have
279 filesystems big enough to use them. We ought to follow NFS4 in
280 using 64-bit device and inode identification, perhaps with a
281 protocol version bump.
283 Once we've seen all the names for a particular file, we no longer
284 need to think about it and we can deallocate the memory.
286 We can also have the case where there are links to a file that are
287 not in the tree being transferred. There's nothing we can do about
288 that. Because we rename the destination into place after writing,
289 any hardlinks to the old file are always going to be orphaned. In
290 fact that is almost necessary because otherwise we'd get really
291 confused if we were generating checksums for one name of a file and
294 At the moment the code seems to make a whole second copy of the file
295 list, which seems unnecessary.
297 We should have a test case that exercises hard links. Since it
298 might be hard to compare ./tls output where the inodes change we
299 might need a little program to check whether several names refer to
304 Implement suggestions from http://www.kame.net/newsletter/19980604/
305 and ftp://ftp.iij.ad.jp/pub/RFC/rfc2553.txt
307 If a host has multiple addresses, then listen try to connect to all
308 in order until we get through. (getaddrinfo may return multiple
309 addresses.) This is kind of implemented already.
311 Possibly also when starting as a server we may need to listen on
312 multiple passive addresses. This might be a bit harder, because we
313 may need to select on all of them. Hm.
315 Define a syntax for IPv6 literal addresses. Since they include
316 colons, they tend to break most naming systems, including ours.
317 Based on the HTTP IPv6 syntax, I think we should use
319 rsync://[::1]/foo/bar
322 which should just take a small change to the parser code.
327 If we hang or get SIGINT, then explain where we were up to. Perhaps
328 have a static buffer that contains the current function name, or
329 some kind of description of what we were trying to do. This is a
330 little easier on people than needing to run strace/truss.
332 "The dungeon collapses! You are killed." Rather than "unexpected
333 eof" give a message that is more detailed if possible and also more
336 If we get an error writing to a socket, then we should perhaps
337 continue trying to read to see if an error message comes across
338 explaining why the socket is closed. I'm not sure if this would
339 work, but it would certainly make our messages more helpful.
341 What happens if a directory is missing -x attributes. Do we lose
342 our load? (Debian #28416) Probably fixed now, but a test case
348 Device major/minor numbers should be at least 32 bits each. See
349 http://lists.samba.org/pipermail/rsync/2001-November/005357.html
351 Transfer ACLs. Need to think of a standard representation.
352 Probably better not to even try to convert between NT and POSIX.
353 Possibly can share some code with Samba.
357 With the current common --include '*/' --exclude '*' pattern, people
358 can end up with many empty directories. We might avoid this by
359 lazily creating such directories.
364 Perhaps don't use our own zlib.
368 - will automatically be up to date with bugfixes in zlib
370 - can leave it out for small rsync on e.g. recovery disks
372 - can use a shared library
374 - avoids people breaking rsync by trying to do this themselves and
377 Should we ship zlib for systems that don't have it, or require
378 people to install it separately?
380 Apparently this will make us incompatible with versions of rsync
381 that use the patched version of rsync. Probably the simplest way to
382 do this is to just disable gzip (with a warning) when talking to old
388 Perhaps flush stdout after each filename, so that people trying to
389 monitor progress in a log file can do so more easily. See
390 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
392 At the connections that just get a list of modules are not logged,
395 If a child of the rsync daemon dies with a signal, we should notice
396 that when we reap it and log a message.
398 Keep stderr and stdout properly separated (Debian #23626)
400 After we get the @RSYNCD greeting from the server, we know it's
401 version but we have not yet sent the command line, so we could just
402 remove the -z option if the server is too old.
404 For ssh invocation it's not so simple, because we actually use the
405 command line to start the remote process. However, we only actually
406 do compression in token.c, and we could therefore once we discover
407 the remote version emit an error if it's too old. I'm not sure if
408 that's a good tradeoff or not.
413 There are already some patches to do this.
417 Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do
418 HTTP Basic Proxy-Authentication.
420 Multiple schemes are possible, up to and including the insanity that
421 is NTLM, but Basic probably covers most cases.
425 Add --with-socks, and then perhaps a command-line option to put them
426 on or off. This might be more reliable than LD_PRELOAD hacks.
430 rsync to a FAT partition on a Unix machine doesn't work very well
431 at the moment. I think we get errors about invalid filenames and
432 perhaps also trying to do atomic renames.
434 I guess the code to do this is currently #ifdef'd on Windows; perhaps
435 we ought to intelligently fall back to it on Unix too.
440 <Rasmus> mbp: hey, how about an rsync option that just gives you the
441 summary without the list of files? And perhaps gives more
442 information like the number of new files, number of changed,
444 <mbp> Rasmus: nice idea
445 <mbp> there is --stats
446 <mbp> but at the moment it's very tridge-oriented
447 <mbp> rather than user-friendly
448 <mbp> it would be nice to improve it
449 <mbp> that would also work well with --dryrun
453 Rather than storing the file list in memory, store it in a TDB.
455 This *might* make memory usage lower while building the file list.
457 Hashtable lookup will mean files are not transmitted in order,
460 This would neatly eliminate one of the major post-fork shared data
466 On 12 Mar 2002, Dave Dykstra <dwd@bell-labs.com> wrote:
467 > If we would add an option to do that functionality, I would vote for one
468 > that was more general which could mask off any set of permission bits and
469 > possibly add any set of bits. Perhaps a chmod-like syntax if it could be
470 > implemented simply.
472 I think that would be good too. For example, people uploading files
473 to a web server might like to say
475 rsync -avzP --chmod a+rX ./ sourcefrog.net:/home/www/sourcefrog/
477 Ideally the patch would implement as many of the gnu chmod semantics
478 as possible. I think the mode parser should be a separate function
479 that passes back something like (mask,set) description to the rest of
480 the program. For bonus points there would be a test case for the
488 Allow people to specify the diff command. (Might want to use wdiff,
491 Just diff the temporary file with the destination file, and delete
492 the tmp file rather than moving it into place.
494 Interaction with --partial.
496 Security interactions with daemon mode?
498 (Suggestion from david.e.sewell)
501 Incorrect timestamps (Debian #100295)
503 A bit hard to believe, but apparently it happens.
506 Check "refuse options works"
508 We need a test case for this...
510 Was this broken when we changed to popt?
513 PERFORMANCE ----------------------------------------------------------
517 If we're doing a local transfer, or using -W, then perhaps don't
518 send the file checksum. If we're doing a local transfer, then
519 calculating MD4 checksums uses 90% of CPU and is unlikely to be
522 Indeed for transfers over zlib or ssh we can also rely on the
523 transport to have quite strong protection against corruption.
525 Perhaps we should have an option to disable this, analogous to
526 --whole-file, although it would default to disabled. The file
527 checksum takes up a definite space in the protocol -- we can either
528 set it to 0, or perhaps just leave it out.
532 Perhaps borrow an assembler MD4 from someone?
534 Make sure we call MD4 with properly-sized blocks whenever possible
535 to avoid copying into the residue region?
539 Test whether this is actually faster than just using malloc(). If
540 it's not (anymore), throw it out.
543 PLATFORMS ------------------------------------------------------------
547 Don't detach, because this messes up --srvany.
549 http://sources.redhat.com/ml/cygwin/2001-08/msg00234.html
551 According to "Effective TCP/IP Programming" (??) close() on a socket
552 has incorrect behaviour on Windows -- it sends a RST packet to the
553 other side, which gives a "connection reset by peer" error. On that
554 platform we should probably do shutdown() instead. However, on Unix
555 we are correct to call close(), because shutdown() discards
559 DEVELOPMENT ----------------------------------------------------------
563 Build rsync with SPLINT to try to find security holes. Add
564 annotations as necessary. Keep track of the number of warnings
565 found initially, and see how many of them are real bugs, or real
566 security bugs. Knowing the percentage of likely hits would be
567 really interesting for other projects.
571 Something that just keeps running rsync continuously over a data set
572 likely to generate problems.
576 Run current rsync versions against significant past releases.
580 jra recommends Valgrind:
582 http://devel-home.kde.org/~sewardj/
588 Build tar file; upload
590 Send announcement to mailing list and c.o.l.a.
592 Make freshmeat announcement
598 TESTING --------------------------------------------------------------
602 Part of the regression suite should be making sure that we don't
603 break backwards compatibility: old clients vs new servers and so
604 on. Ideally we would test both up and down from the current release
607 We might need to omit broken old versions, or versions in which
608 particular functionality is broken
610 It might be sufficient to test downloads from well-known public
611 rsync servers running different versions of rsync. This will give
612 some testing and also be the most common case for having different
613 versions and not being able to upgrade.
616 Test on kernel source
618 Download all versions of kernel; unpack, sync between them. Also
619 sync between uncompressed tarballs. Compare directories after
622 Use local mode; ssh; daemon; --whole-file and --no-whole-file.
624 Use awk to pull out the 'speedup' number for each transfer. Make
630 Sparse and non-sparse
634 Insert bytes, delete bytes, swap blocks, ...
636 configure option to enable dangerous tests
638 If tests are skipped, say why.
640 Test daemon feature to disallow particular options.
642 Pipe program that makes slow/jerky connections.
644 Versions of read() and write() that corrupt the stream, or abruptly fail
646 Separate makefile target to run rough tests -- or perhaps just run
649 Test "refuse options" works
651 What about for --recursive?
653 If you specify an unrecognized option here, you should get an error.
656 DOCUMENTATION --------------------------------------------------------
660 Keep list of open issues and todos on the web site
662 Update web site from CVS
665 Perhaps redo manual as SGML
667 The man page is getting rather large, and there is more information
668 that ought to be added.
670 TexInfo source is probably a dying format.
672 Linuxdoc looks like the most likely contender. I know DocBook is
673 favoured by some people, but it's so bloody verbose, even with emacs
677 BUILD FARM -----------------------------------------------------------
681 AMDAHL UTS (Dave Dykstra)
683 Cygwin (on different versions of Win32?)
685 HP-UX variants (via HP?)
690 LOGGING --------------------------------------------------------------
692 Perhaps flush stdout after each filename, so that people trying to
693 monitor progress in a log file can do so more easily. See
694 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
696 At the connections that just get a list of modules are not logged,
699 If a child of the rsync daemon dies with a signal, we should notice
700 that when we reap it and log a message.
702 Keep stderr and stdout properly separated (Debian #23626)
704 Use a separate function for reporting errors; prefix it with
705 "rsync:" or "rsync(remote)", or perhaps even "rsync(local
710 Indicate whether files are new, updated, or deleted
712 At end of transfer, show how many files were or were not transferred
717 Explain *why* every file is transferred or not (e.g. "local mtime
718 123123 newer than 1283198")
723 Add an rsyncd.conf parameter to turn on debugging on the server.
727 NICE -----------------------------------------------------------------
729 --no-detach and --no-fork options
731 Very useful for debugging. Also good when running under a
732 daemon-monitoring process that tries to restart the service when the
735 hang/timeout friendliness
739 Change to using gettext(). Probably need to ship this for platforms
742 Solicit translations.
744 Does anyone care? Before we bother modifying the code, we ought to
745 get the manual translated first, because that's possibly more useful
746 and at any rate demonstrates desire.
750 Write a small emulation of interactive ftp as a Pythonn program
751 that calls rsync. Commands such as "cd", "ls", "ls *.c" etc map
752 fairly directly into rsync commands: it just needs to remember the
753 current host, directory and so on. We can probably even do
754 completion of remote filenames.
757 RELATED PROJECTS -----------------------------------------------------
759 http://rsync.samba.org/rsync-and-debian/
763 Exhaustive, tortuous testing
767 rsyncsplit as alternative to real integration with gzip?
769 reverse rsync over HTTP Range
771 Goswin Brederlow suggested this on Debian; I think tridge and I
772 talked about it previous in relation to rproxy.