X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/blobdiff_plain/b3e6c8156529f78b097820ff964bff3e14753286..0e5a1f8352635a291be1ac5afe954a76dbc1664e:/TODO diff --git a/TODO b/TODO index cb187126..988e1f03 100644 --- a/TODO +++ b/TODO @@ -32,6 +32,7 @@ use chroot for people who want to generate the file list using a find(1) command or a script. + Performance Traverse just one directory at a time. Tridge says it's possible. @@ -40,9 +41,12 @@ Performance start, which makes us use a lot of memory and also not pipeline network access as much as we could. + +Handling duplicate names + We need to be careful of duplicate names getting into the file list. - See clean_flist. This could happen if multiple arguments include - the same file. Bad. + See clean_flist(). This could happen if multiple arguments include + the same file. Bad. I think duplicates are only a problem if they're both flowing through the pipeline at the same time. For example we might have @@ -53,10 +57,35 @@ Performance Possibly if we did one directory at a time that would be sufficient. Alternatively we could pre-process the arguments to make sure no - duplicates will ever be inserted. + duplicates will ever be inserted. There could be some bad cases + when we're collapsing symlinks. We could have a hash table. + The root of the problem is that we do not want more than one file + list entry referring to the same file. At first glance there are + several ways this could happen: symlinks, hardlinks, and repeated + names on the command line. + + If names are repeated on the command line, they may be present in + different forms, perhaps by traversing directory paths in different + ways, traversing paths including symlinks. Also we need to allow + for expansion of globs by rsync. + + At the moment, clean_flist() requires having the entire file list in + memory. Duplicate names are detected just by a string comparison. + + We don't need to worry about hard links causing duplicates because + files are never updated in place. Similarly for symlinks. + + I think even if we're using a different symlink mode we don't need + to worry. + + Unless we're really clever this will introduce a protocol + incompatibility, so we need to be able to accept the old format as + well. + + Memory accounting At exit, show how much memory was used for the file list, etc. @@ -65,11 +94,19 @@ Memory accounting not sure this makes sense with modern mallocs. At any rate it will make us allocate a huge amount of memory for large file lists. + We can try using the GNU/SVID/XPG mallinfo() function to get some + heap statistics. + + Hard-link handling At the moment hardlink handling is very expensive, so it's off by default. It does not need to be so. + Since most of the solutions are rather intertwined with the file + list it is probably better to fix that first, although fixing + hardlinks is possibly simpler. + We can rule out hardlinked directories since they will probably screw us up in all kinds of ways. They simply should not be used. @@ -236,8 +273,6 @@ SIGHUP hang/timeout friendliness - On - verbose output Indicate whether files are new, updated, or deleted