X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/blobdiff_plain/b3e6c8156529f78b097820ff964bff3e14753286..43a4dc1053bd3bfec67f3cc6a6fa4edc1f394a82:/TODO diff --git a/TODO b/TODO index cb187126..42439806 100644 --- a/TODO +++ b/TODO @@ -32,6 +32,17 @@ use chroot for people who want to generate the file list using a find(1) command or a script. +File list structure in memory + + Rather than one big array, perhaps have a tree in memory mirroring + the directory tree. + + This might make sorting much faster! (I'm not sure it's a big CPU + problem, mind you.) + + It might also reduce memory use in storing repeated directory names + -- again I'm not sure this is a problem. + Performance Traverse just one directory at a time. Tridge says it's possible. @@ -40,9 +51,12 @@ Performance start, which makes us use a lot of memory and also not pipeline network access as much as we could. + +Handling duplicate names + We need to be careful of duplicate names getting into the file list. - See clean_flist. This could happen if multiple arguments include - the same file. Bad. + See clean_flist(). This could happen if multiple arguments include + the same file. Bad. I think duplicates are only a problem if they're both flowing through the pipeline at the same time. For example we might have @@ -53,10 +67,35 @@ Performance Possibly if we did one directory at a time that would be sufficient. Alternatively we could pre-process the arguments to make sure no - duplicates will ever be inserted. + duplicates will ever be inserted. There could be some bad cases + when we're collapsing symlinks. We could have a hash table. + The root of the problem is that we do not want more than one file + list entry referring to the same file. At first glance there are + several ways this could happen: symlinks, hardlinks, and repeated + names on the command line. + + If names are repeated on the command line, they may be present in + different forms, perhaps by traversing directory paths in different + ways, traversing paths including symlinks. Also we need to allow + for expansion of globs by rsync. + + At the moment, clean_flist() requires having the entire file list in + memory. Duplicate names are detected just by a string comparison. + + We don't need to worry about hard links causing duplicates because + files are never updated in place. Similarly for symlinks. + + I think even if we're using a different symlink mode we don't need + to worry. + + Unless we're really clever this will introduce a protocol + incompatibility, so we need to be able to accept the old format as + well. + + Memory accounting At exit, show how much memory was used for the file list, etc. @@ -65,11 +104,16 @@ Memory accounting not sure this makes sense with modern mallocs. At any rate it will make us allocate a huge amount of memory for large file lists. + Hard-link handling At the moment hardlink handling is very expensive, so it's off by default. It does not need to be so. + Since most of the solutions are rather intertwined with the file + list it is probably better to fix that first, although fixing + hardlinks is possibly simpler. + We can rule out hardlinked directories since they will probably screw us up in all kinds of ways. They simply should not be used. @@ -176,10 +220,30 @@ Empty directories can end up with many empty directories. We might avoid this by lazily creating such directories. + zlib - Perhaps don't use our own zlib. Will we actually be incompatible, - or just be slightly less efficient? + Perhaps don't use our own zlib. + + Advantages: + + - will automatically be up to date with bugfixes in zlib + + - can leave it out for small rsync on e.g. recovery disks + + - can use a shared library + + - avoids people breaking rsync by trying to do this themselves and + messing up + + Should we ship zlib for systems that don't have it, or require + people to install it separately? + + Apparently this will make us incompatible with versions of rsync + that use the patched version of rsync. Probably the simplest way to + do this is to just disable gzip (with a warning) when talking to old + versions. + logging @@ -187,10 +251,52 @@ logging monitor progress in a log file can do so more easily. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108 + At the connections that just get a list of modules are not logged, + but they should be. + rsyncd over ssh There are already some patches to do this. +proxy authentication + + Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do + HTTP Basic Proxy-Authentication. + + Multiple schemes are possible, up to and including the insanity that + is NTLM, but Basic probably covers most cases. + +SOCKS + + Add --with-socks, and then perhaps a command-line option to put them + on or off. This might be more reliable than LD_PRELOAD hacks. + +Better statistics: + + mbp: hey, how about an rsync option that just gives you the + summary without the list of files? And perhaps gives more + information like the number of new files, number of changed, + deleted, etc. ? + Rasmus: nice idea + there is --stats + but at the moment it's very tridge-oriented + rather than user-friendly + it would be nice to improve it + that would also work well with --dryrun + +TDB: + + Rather than storing the file list in memory, store it in a TDB. + + This *might* make memory usage lower while building the file list. + + Hashtable lookup will mean files are not transmitted in order, + though... hm. + + This would neatly eliminate one of the major post-fork shared data + structures. + + PLATFORMS ------------------------------------------------------------ Win32 @@ -206,6 +312,31 @@ Win32 we are correct to call close(), because shutdown() discards untransmitted data. +DEVELOPMENT ---------------------------------------------------------- + +Splint + + Build rsync with SPLINT to try to find security holes. Add + annotations as necessary. Keep track of the number of warnings + found initially, and see how many of them are real bugs, or real + security bugs. Knowing the percentage of likely hits would be + really interesting for other projects. + +Torture test + + Something that just keeps running rsync continuously over a data set + likely to generate problems. + +Cross-testing + + Run current rsync versions against significant past releases. + +Memory debugger + + jra recommends: + + http://devel-home.kde.org/~sewardj/ + DOCUMENTATION -------------------------------------------------------- Update README @@ -224,10 +355,6 @@ Add machines NICE ----------------------------------------------------------------- -SIGHUP - - Re-read config file (just exec() ourselves) rather than exiting. - --no-detach and --no-fork options Very useful for debugging. Also good when running under a @@ -236,12 +363,13 @@ SIGHUP hang/timeout friendliness - On - verbose output Indicate whether files are new, updated, or deleted + At end of transfer, show how many files were or were not transferred + correctly. + internationalization Change to using gettext(). Probably need to ship this for platforms @@ -258,5 +386,3 @@ rsyncsh fairly directly into rsync commands: it just needs to remember the current host, directory and so on. We can probably even do completion of remote filenames. - -%K%