3 BUGS ---------------------------------------------------------------
4 Fix hardlink reporting 2002/03/25
5 Fix progress indicator to not corrupt log
7 Do not rely on having a group called "nobody"
8 Incorrect timestamps (Debian #100295)
11 FEATURES ------------------------------------------------------------
12 server-imposed bandwidth limits
14 Use chroot only if supported
15 Allow supplementary groups in rsyncd.conf 2002/04/09
16 Handling IPv6 on old machines
18 Add ACL support 2001/12/02
19 Lazy directory creation
20 Conditional -z for old protocols
21 proxy authentication 2002/01/23
24 Allow forcing arbitrary permissions 2002/03/12
25 --diff david.e.sewell 2002/03/15
26 Add daemon --no-detach and --no-fork options
28 DOCUMENTATION --------------------------------------------------------
30 Keep list of open issues and todos on the web site
31 Update web site from CVS
32 Perhaps redo manual as SGML
34 LOGGING --------------------------------------------------------------
35 Make dry run list all updates 2002/04/03
37 Improve error messages
38 Better statistics: Rasmus 2002/03/08
39 Perhaps flush stdout like syslog
40 Log deamon sessions that just list modules
41 Log child death on signal
42 Keep stderr and stdout properly separated (Debian #23626)
43 Log errors with function that reports process of origin
44 verbose output David Stein 2001/12/20
45 Add reason for transfer to file logging
46 debugging of daemon 2002/04/08
49 DEVELOPMENT --------------------------------------------------------
50 Handling duplicate names
51 Use generic zlib 2002/02/25
56 Add machines to build farm
58 PERFORMANCE ----------------------------------------------------------
59 File list structure in memory
60 Traverse just one directory at a time
62 Allow skipping MD4 file_sum 2002/04/08
66 TESTING --------------------------------------------------------------
68 Cross-test versions 2001/08/22
71 Create mutator program for testing
72 Create configure option to enable dangerous tests
73 If tests are skipped, say why.
74 Test daemon feature to disallow particular options.
75 Create pipe program for testing
76 Create test makefile target for some tests
77 Test "refuse options" works
79 RELATED PROJECTS -----------------------------------------------------
81 http://rsync.samba.org/rsync-and-debian/
83 rsyncsplit as alternative to real integration with gzip?
84 reverse rsync over HTTP Range
88 BUGS ---------------------------------------------------------------
90 Fix hardlink reporting 2002/03/25
91 (was: There seems to be a bug with hardlinks)
93 mbp/2 build$ ls -l /tmp/a /tmp/b -i
96 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
97 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
98 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
99 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
100 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
101 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
102 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
103 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
107 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
108 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
109 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
110 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
111 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
112 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
113 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
114 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
115 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
116 building file list ... done
117 created directory /tmp/b
123 wrote 350 bytes read 52 bytes 804.00 bytes/sec
124 total size is 232 speedup is 0.58
125 mbp/2 build$ rm -r /tmp/b
126 mbp/2 build$ ls -l /tmp/b
127 ls: /tmp/b: No such file or directory
128 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
129 rm: cannot remove `/tmp/b': No such file or directory
130 mbp/2 build$ rm -f -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
131 building file list ... done
132 created directory /tmp/b
138 wrote 350 bytes read 52 bytes 804.00 bytes/sec
139 total size is 232 speedup is 0.58
140 mbp/2 build$ ls -l /tmp/b
142 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
143 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
144 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
145 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
146 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
147 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
148 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
149 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
150 mbp/2 build$ ls -l /tmp/a
152 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
153 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
154 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
155 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
156 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
157 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
158 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
159 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
164 Fix progress indicator to not corrupt log
166 Progress indicator can produce corrupt output when transferring directories:
169 main/binary-arm/admin/
170 main/binary-arm/base/
171 main/binary-arm/comm/8.56kB/s 0:00:52
172 main/binary-arm/devel/
174 main/binary-arm/editors/
175 main/binary-arm/electronics/s 0:00:53
176 main/binary-arm/games/
177 main/binary-arm/graphics/
178 main/binary-arm/hamradio/
179 main/binary-arm/interpreters/
180 main/binary-arm/libs/6.61kB/s 0:00:54
181 main/binary-arm/mail/
182 main/binary-arm/math/
183 main/binary-arm/misc/
190 I don't think we handle this properly on systems that don't have the
191 call. Are there any such?
196 Do not rely on having a group called "nobody"
198 http://www.linuxbase.org/spec/refspecs/LSB_1.1.0/gLSB/usernames.html
200 On Debian it's "nogroup"
205 Incorrect timestamps (Debian #100295)
207 A bit hard to believe, but apparently it happens.
214 Don't detach, because this messes up --srvany.
216 http://sources.redhat.com/ml/cygwin/2001-08/msg00234.html
222 FEATURES ------------------------------------------------------------
224 server-imposed bandwidth limits
231 There are already some patches to do this.
233 BitKeeper uses a server whose login shell is set to bkd. That's
234 probably a reasonable approach.
239 Use chroot only if supported
241 If the platform doesn't support it, then don't even try.
243 If running as non-root, then don't fail, just give a warning.
244 (There was a thread about this a while ago?)
246 http://lists.samba.org/pipermail/rsync/2001-August/thread.html
247 http://lists.samba.org/pipermail/rsync/2001-September/thread.html
252 Allow supplementary groups in rsyncd.conf 2002/04/09
254 Perhaps allow supplementary groups to be specified in rsyncd.conf;
255 then make the first one the primary gid and all the rest be
261 Handling IPv6 on old machines
263 The KAME IPv6 patch is nice in theory but has proved a bit of a
264 nightmare in practice. The basic idea of their patch is that rsync
265 is rewritten to use the new getaddrinfo()/getnameinfo() interface,
266 rather than gethostbyname()/gethostbyaddr() as in rsync 2.4.6.
267 Systems that don't have the new interface are handled by providing
268 our own implementation in lib/, which is selectively linked in.
270 The problem with this is that it is really hard to get right on
271 platforms that have a half-working implementation, so redefining
272 these functions clashes with system headers, and leaving them out
273 breaks. This affects at least OSF/1, RedHat 5, and Cobalt, which
274 are moderately improtant.
276 Perhaps the simplest solution would be to have two different files
277 implementing the same interface, and choose either the new or the
278 old API. This is probably necessary for systems that e.g. have
279 IPv6, but gethostbyaddr() can't handle it. The Linux manpage claims
280 this is currently the case.
282 In fact, our internal sockets interface (things like
283 open_socket_out(), etc) is much narrower than the getaddrinfo()
284 interface, and so probably simpler to get right. In addition, the
285 old code is known to work well on old machines.
287 We could drop the rather large lib/getaddrinfo files.
294 Implement suggestions from http://www.kame.net/newsletter/19980604/
295 and ftp://ftp.iij.ad.jp/pub/RFC/rfc2553.txt
297 If a host has multiple addresses, then listen try to connect to all
298 in order until we get through. (getaddrinfo may return multiple
299 addresses.) This is kind of implemented already.
301 Possibly also when starting as a server we may need to listen on
302 multiple passive addresses. This might be a bit harder, because we
303 may need to select on all of them. Hm.
305 Define a syntax for IPv6 literal addresses. Since they include
306 colons, they tend to break most naming systems, including ours.
307 Based on the HTTP IPv6 syntax, I think we should use
309 rsync://[::1]/foo/bar [::1]::bar
311 which should just take a small change to the parser code.
316 Add ACL support 2001/12/02
318 Transfer ACLs. Need to think of a standard representation.
319 Probably better not to even try to convert between NT and POSIX.
320 Possibly can share some code with Samba.
325 Lazy directory creation
327 With the current common --include '*/' --exclude '*' pattern, people
328 can end up with many empty directories. We might avoid this by
329 lazily creating such directories.
334 Conditional -z for old protocols
336 After we get the @RSYNCD greeting from the server, we know it's
337 version but we have not yet sent the command line, so we could just
338 remove the -z option if the server is too old.
340 For ssh invocation it's not so simple, because we actually use the
341 command line to start the remote process. However, we only actually
342 do compression in token.c, and we could therefore once we discover
343 the remote version emit an error if it's too old. I'm not sure if
344 that's a good tradeoff or not.
349 proxy authentication 2002/01/23
351 Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do
352 HTTP Basic Proxy-Authentication.
354 Multiple schemes are possible, up to and including the insanity that
355 is NTLM, but Basic probably covers most cases.
362 Add --with-socks, and then perhaps a command-line option to put them
363 on or off. This might be more reliable than LD_PRELOAD hacks.
370 rsync to a FAT partition on a Unix machine doesn't work very well at
371 the moment. I think we get errors about invalid filenames and
372 perhaps also trying to do atomic renames.
374 I guess the code to do this is currently #ifdef'd on Windows;
375 perhaps we ought to intelligently fall back to it on Unix too.
380 Allow forcing arbitrary permissions 2002/03/12
382 On 12 Mar 2002, Dave Dykstra <dwd@bell-labs.com> wrote:
383 > If we would add an option to do that functionality, I
384 > would vote for one that was more general which could mask
385 > off any set of permission bits and possibly add any set of
386 > bits. Perhaps a chmod-like syntax if it could be
387 > implemented simply.
389 I think that would be good too. For example, people uploading files
390 to a web server might like to say
392 rsync -avzP --chmod a+rX ./ sourcefrog.net:/home/www/sourcefrog/
394 Ideally the patch would implement as many of the gnu chmod semantics
395 as possible. I think the mode parser should be a separate function
396 that passes back something like (mask,set) description to the rest
397 of the program. For bonus points there would be a test case for the
400 Possibly also --chown
407 --diff david.e.sewell 2002/03/15
409 Allow people to specify the diff command. (Might want to use wdiff,
412 Just diff the temporary file with the destination file, and delete
413 the tmp file rather than moving it into place.
415 Interaction with --partial.
417 Security interactions with daemon mode?
422 Add daemon --no-detach and --no-fork options
424 Very useful for debugging. Also good when running under a
425 daemon-monitoring process that tries to restart the service when the
430 DOCUMENTATION --------------------------------------------------------
437 Keep list of open issues and todos on the web site
442 Update web site from CVS
447 Perhaps redo manual as SGML
449 The man page is getting rather large, and there is more information
450 that ought to be added.
452 TexInfo source is probably a dying format.
454 Linuxdoc looks like the most likely contender. I know DocBook is
455 favoured by some people, but it's so bloody verbose, even with emacs
460 LOGGING --------------------------------------------------------------
462 Make dry run list all updates 2002/04/03
466 Mark Santcroos points out that -n fails to list files which have
467 only metadata changes, though it probably should.
469 There may be a Debian bug about this as well.
476 At exit, show how much memory was used for the file list, etc.
478 Also we do a wierd exponential-growth allocation in flist.c. I'm
479 not sure this makes sense with modern mallocs. At any rate it will
480 make us allocate a huge amount of memory for large file lists.
485 Improve error messages
487 If we hang or get SIGINT, then explain where we were up to. Perhaps
488 have a static buffer that contains the current function name, or
489 some kind of description of what we were trying to do. This is a
490 little easier on people than needing to run strace/truss.
492 "The dungeon collapses! You are killed." Rather than "unexpected
493 eof" give a message that is more detailed if possible and also more
496 If we get an error writing to a socket, then we should perhaps
497 continue trying to read to see if an error message comes across
498 explaining why the socket is closed. I'm not sure if this would
499 work, but it would certainly make our messages more helpful.
501 What happens if a directory is missing -x attributes. Do we lose
502 our load? (Debian #28416) Probably fixed now, but a test case would
510 Better statistics: Rasmus 2002/03/08
513 hey, how about an rsync option that just gives you the
514 summary without the list of files? And perhaps gives
515 more information like the number of new files, number
516 of changed, deleted, etc. ?
519 nice idea there is --stats but at the moment it's very
520 tridge-oriented rather than user-friendly it would be
521 nice to improve it that would also work well with
527 Perhaps flush stdout like syslog
529 Perhaps flush stdout after each filename, so that people trying to
530 monitor progress in a log file can do so more easily. See
531 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
536 Log deamon sessions that just list modules
538 At the connections that just get a list of modules are not logged,
544 Log child death on signal
546 If a child of the rsync daemon dies with a signal, we should notice
547 that when we reap it and log a message.
552 Keep stderr and stdout properly separated (Debian #23626)
557 Log errors with function that reports process of origin
559 Use a separate function for reporting errors; prefix it with
560 "rsync:" or "rsync(remote)", or perhaps even "rsync(local
566 verbose output David Stein 2001/12/20
568 Indicate whether files are new, updated, or deleted
570 At end of transfer, show how many files were or were not transferred
576 Add reason for transfer to file logging
578 Explain *why* every file is transferred or not (e.g. "local mtime
579 123123 newer than 1283198")
584 debugging of daemon 2002/04/08
586 Add an rsyncd.conf parameter to turn on debugging on the server.
593 Change to using gettext(). Probably need to ship this for platforms
596 Solicit translations.
598 Does anyone care? Before we bother modifying the code, we ought to
599 get the manual translated first, because that's possibly more useful
600 and at any rate demonstrates desire.
604 DEVELOPMENT --------------------------------------------------------
606 Handling duplicate names
608 We need to be careful of duplicate names getting into the file list.
609 See clean_flist(). This could happen if multiple arguments include
612 I think duplicates are only a problem if they're both flowing
613 through the pipeline at the same time. For example we might have
614 updated the first occurrence after reading the checksums for the
615 second. So possibly we just need to make sure that we don't have
616 both in the pipeline at the same time.
618 Possibly if we did one directory at a time that would be sufficient.
620 Alternatively we could pre-process the arguments to make sure no
621 duplicates will ever be inserted. There could be some bad cases
622 when we're collapsing symlinks.
624 We could have a hash table.
626 The root of the problem is that we do not want more than one file
627 list entry referring to the same file. At first glance there are
628 several ways this could happen: symlinks, hardlinks, and repeated
629 names on the command line.
631 If names are repeated on the command line, they may be present in
632 different forms, perhaps by traversing directory paths in different
633 ways, traversing paths including symlinks. Also we need to allow
634 for expansion of globs by rsync.
636 At the moment, clean_flist() requires having the entire file list in
637 memory. Duplicate names are detected just by a string comparison.
639 We don't need to worry about hard links causing duplicates because
640 files are never updated in place. Similarly for symlinks.
642 I think even if we're using a different symlink mode we don't need
645 Unless we're really clever this will introduce a protocol
646 incompatibility, so we need to be able to accept the old format as
652 Use generic zlib 2002/02/25
654 Perhaps don't use our own zlib.
658 - will automatically be up to date with bugfixes in zlib
660 - can leave it out for small rsync on e.g. recovery disks
662 - can use a shared library
664 - avoids people breaking rsync by trying to do this themselves and
667 Should we ship zlib for systems that don't have it, or require
668 people to install it separately?
670 Apparently this will make us incompatible with versions of rsync
671 that use the patched version of rsync. Probably the simplest way to
672 do this is to just disable gzip (with a warning) when talking to old
680 Rather than storing the file list in memory, store it in a TDB.
682 This *might* make memory usage lower while building the file list.
684 Hashtable lookup will mean files are not transmitted in order,
687 This would neatly eliminate one of the major post-fork shared data
695 Build rsync with SPLINT to try to find security holes. Add
696 annotations as necessary. Keep track of the number of warnings
697 found initially, and see how many of them are real bugs, or real
698 security bugs. Knowing the percentage of likely hits would be
699 really interesting for other projects.
706 jra recommends Valgrind:
708 http://devel-home.kde.org/~sewardj/
713 Create release script
719 Build tar file; upload
721 Send announcement to mailing list and c.o.l.a.
723 Make freshmeat announcement
730 Add machines to build farm
732 Cygwin (on different versions of Win32?)
734 HP-UX variants (via HP?)
742 PERFORMANCE ----------------------------------------------------------
744 File list structure in memory
746 Rather than one big array, perhaps have a tree in memory mirroring
749 This might make sorting much faster! (I'm not sure it's a big CPU
752 It might also reduce memory use in storing repeated directory names
753 -- again I'm not sure this is a problem.
758 Traverse just one directory at a time
760 Traverse just one directory at a time. Tridge says it's possible.
762 At the moment rsync reads the whole file list into memory at the
763 start, which makes us use a lot of memory and also not pipeline
764 network access as much as we could.
771 At the moment hardlink handling is very expensive, so it's off by
772 default. It does not need to be so.
774 Since most of the solutions are rather intertwined with the file
775 list it is probably better to fix that first, although fixing
776 hardlinks is possibly simpler.
778 We can rule out hardlinked directories since they will probably
779 screw us up in all kinds of ways. They simply should not be used.
781 At the moment rsync only cares about hardlinks to regular files. I
782 guess you could also use them for sockets, devices and other beasts,
783 but I have not seen them.
785 When trying to reproduce hard links, we only need to worry about
786 files that have more than one name (nlinks>1 && !S_ISDIR).
788 The basic point of this is to discover alternate names that refer to
789 the same file. All operations, including creating the file and
790 writing modifications to it need only to be done for the first name.
791 For all later names, we just create the link and then leave it
794 If hard links are to be preserved:
796 Before the generator/receiver fork, the list of files is received
797 from the sender (recv_file_list), and a table for detecting hard
800 The generator looks for hard links within the file list and does
801 not send checksums for them, though it does send other metadata.
803 The sender sends the device number and inode with file entries, so
804 that files are uniquely identified.
806 The receiver goes through and creates hard links (do_hard_links)
807 after all data has been written, but before directory permissions
810 At the moment device and inum are sent as 4-byte integers, which
811 will probably cause problems on large filesystems. On Linux the
812 kernel uses 64-bit ino_t's internally, and people will soon have
813 filesystems big enough to use them. We ought to follow NFS4 in
814 using 64-bit device and inode identification, perhaps with a
815 protocol version bump.
817 Once we've seen all the names for a particular file, we no longer
818 need to think about it and we can deallocate the memory.
820 We can also have the case where there are links to a file that are
821 not in the tree being transferred. There's nothing we can do about
822 that. Because we rename the destination into place after writing,
823 any hardlinks to the old file are always going to be orphaned. In
824 fact that is almost necessary because otherwise we'd get really
825 confused if we were generating checksums for one name of a file and
828 At the moment the code seems to make a whole second copy of the file
829 list, which seems unnecessary.
831 We should have a test case that exercises hard links. Since it
832 might be hard to compare ./tls output where the inodes change we
833 might need a little program to check whether several names refer to
839 Allow skipping MD4 file_sum 2002/04/08
841 If we're doing a local transfer, or using -W, then perhaps don't
842 send the file checksum. If we're doing a local transfer, then
843 calculating MD4 checksums uses 90% of CPU and is unlikely to be
846 Indeed for transfers over zlib or ssh we can also rely on the
847 transport to have quite strong protection against corruption.
849 Perhaps we should have an option to disable this,
850 analogous to --whole-file, although it would default to
851 disabled. The file checksum takes up a definite space in
852 the protocol -- we can either set it to 0, or perhaps just
860 Perhaps borrow an assembler MD4 from someone?
862 Make sure we call MD4 with properly-sized blocks whenever possible
863 to avoid copying into the residue region?
870 Test whether this is actually faster than just using malloc(). If
871 it's not (anymore), throw it out.
875 TESTING --------------------------------------------------------------
879 Something that just keeps running rsync continuously over a data set
880 likely to generate problems.
885 Cross-test versions 2001/08/22
887 Part of the regression suite should be making sure that we
888 don't break backwards compatibility: old clients vs new
889 servers and so on. Ideally we would test both up and down
890 from the current release to all old versions.
892 Run current rsync versions against significant past releases.
894 We might need to omit broken old versions, or versions in which
895 particular functionality is broken
897 It might be sufficient to test downloads from well-known public
898 rsync servers running different versions of rsync. This will give
899 some testing and also be the most common case for having different
900 versions and not being able to upgrade.
902 The new --protocol option may help in this.
907 Test on kernel source
909 Download all versions of kernel; unpack, sync between them. Also
910 sync between uncompressed tarballs. Compare directories after
913 Use local mode; ssh; daemon; --whole-file and --no-whole-file.
915 Use awk to pull out the 'speedup' number for each transfer. Make
923 Sparse and non-sparse
928 Create mutator program for testing
930 Insert bytes, delete bytes, swap blocks, ...
935 Create configure option to enable dangerous tests
940 If tests are skipped, say why.
945 Test daemon feature to disallow particular options.
950 Create pipe program for testing
952 Create pipe program that makes slow/jerky connections for
953 testing Versions of read() and write() that corrupt the
954 stream, or abruptly fail
959 Create test makefile target for some tests
961 Separate makefile target to run rough tests -- or perhaps
962 just run them every time?
967 Test "refuse options" works
969 What about for --recursive?
971 If you specify an unrecognized option here, you should get an error.
973 We need a test case for this...
975 Was this broken when we changed to popt?
979 RELATED PROJECTS -----------------------------------------------------
983 Write a small emulation of interactive ftp as a Pythonn program
984 that calls rsync. Commands such as "cd", "ls", "ls *.c" etc map
985 fairly directly into rsync commands: it just needs to remember the
986 current host, directory and so on. We can probably even do
987 completion of remote filenames.
992 http://rsync.samba.org/rsync-and-debian/
1000 Exhaustive, tortuous testing
1007 rsyncsplit as alternative to real integration with gzip?
1012 reverse rsync over HTTP Range
1014 Goswin Brederlow suggested this on Debian; I think tridge and I
1015 talked about it previous in relation to rproxy.