X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/blobdiff_plain/a6a3c3df453f0551e68f08ef3a15d015848b8695..bb7c4fa3612a6202e92611acca7f25c0d8bcf799:/TODO diff --git a/TODO b/TODO index 75d4e56a..5d720d38 100644 --- a/TODO +++ b/TODO @@ -32,6 +32,17 @@ use chroot for people who want to generate the file list using a find(1) command or a script. +File list structure in memory + + Rather than one big array, perhaps have a tree in memory mirroring + the directory tree. + + This might make sorting much faster! (I'm not sure it's a big CPU + problem, mind you.) + + It might also reduce memory use in storing repeated directory names + -- again I'm not sure this is a problem. + Performance Traverse just one directory at a time. Tridge says it's possible. @@ -40,15 +51,69 @@ Performance start, which makes us use a lot of memory and also not pipeline network access as much as we could. + +Handling duplicate names + + We need to be careful of duplicate names getting into the file list. + See clean_flist(). This could happen if multiple arguments include + the same file. Bad. + + I think duplicates are only a problem if they're both flowing + through the pipeline at the same time. For example we might have + updated the first occurrence after reading the checksums for the + second. So possibly we just need to make sure that we don't have + both in the pipeline at the same time. + + Possibly if we did one directory at a time that would be sufficient. + + Alternatively we could pre-process the arguments to make sure no + duplicates will ever be inserted. There could be some bad cases + when we're collapsing symlinks. + + We could have a hash table. + + The root of the problem is that we do not want more than one file + list entry referring to the same file. At first glance there are + several ways this could happen: symlinks, hardlinks, and repeated + names on the command line. + + If names are repeated on the command line, they may be present in + different forms, perhaps by traversing directory paths in different + ways, traversing paths including symlinks. Also we need to allow + for expansion of globs by rsync. + + At the moment, clean_flist() requires having the entire file list in + memory. Duplicate names are detected just by a string comparison. + + We don't need to worry about hard links causing duplicates because + files are never updated in place. Similarly for symlinks. + + I think even if we're using a different symlink mode we don't need + to worry. + + Unless we're really clever this will introduce a protocol + incompatibility, so we need to be able to accept the old format as + well. + + Memory accounting At exit, show how much memory was used for the file list, etc. + Also we do a wierd exponential-growth allocation in flist.c. I'm + not sure this makes sense with modern mallocs. At any rate it will + make us allocate a huge amount of memory for large file lists. + + Hard-link handling At the moment hardlink handling is very expensive, so it's off by default. It does not need to be so. + Since most of the solutions are rather intertwined with the file + list it is probably better to fix that first, although fixing + hardlinks is possibly simpler. + We can rule out hardlinked directories since they will probably screw us up in all kinds of ways. They simply should not be used. @@ -166,10 +231,26 @@ logging monitor progress in a log file can do so more easily. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108 + At the connections that just get a list of modules are not logged, + but they should be. + rsyncd over ssh There are already some patches to do this. +proxy authentication + + Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do + HTTP Basic Proxy-Authentication. + + Multiple schemes are possible, up to and including the insanity that + is NTLM, but Basic probably covers most cases. + +SOCKS + + Add --with-socks, and then perhaps a command-line option to put them + on or off. This might be more reliable than LD_PRELOAD hacks. + PLATFORMS ------------------------------------------------------------ Win32 @@ -203,10 +284,6 @@ Add machines NICE ----------------------------------------------------------------- -SIGHUP - - Re-read config file (just exec() ourselves) rather than exiting. - --no-detach and --no-fork options Very useful for debugging. Also good when running under a @@ -215,8 +292,6 @@ SIGHUP hang/timeout friendliness - On - verbose output Indicate whether files are new, updated, or deleted @@ -238,3 +313,4 @@ rsyncsh current host, directory and so on. We can probably even do completion of remote filenames. +%K%