X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/blobdiff_plain/b3e6c8156529f78b097820ff964bff3e14753286..cf72f20426c4b6c9c2467185f85e09e0028d39b6:/TODO diff --git a/TODO b/TODO index cb187126..456cd7b1 100644 --- a/TODO +++ b/TODO @@ -41,8 +41,8 @@ Performance network access as much as we could. We need to be careful of duplicate names getting into the file list. - See clean_flist. This could happen if multiple arguments include - the same file. Bad. + See clean_flist(). This could happen if multiple arguments include + the same file. Bad. I think duplicates are only a problem if they're both flowing through the pipeline at the same time. For example we might have @@ -53,10 +53,30 @@ Performance Possibly if we did one directory at a time that would be sufficient. Alternatively we could pre-process the arguments to make sure no - duplicates will ever be inserted. + duplicates will ever be inserted. There could be some bad cases + when we're collapsing symlinks. We could have a hash table. + The root of the problem is that we do not want more than one file + list entry referring to the same file. At first glance there are + several ways this could happen: symlinks, hardlinks, and repeated + names on the command line. + + If names are repeated on the command line, they may be present in + different forms, perhaps by traversing directory paths in different + ways, traversing paths including symlinks. Also we need to allow + for expansion of globs by rsync. + + At the moment, clean_flist() requires having the entire file list in + memory. Duplicate names are detected just by a string comparison. + + We don't need to worry about hard links causing duplicates because + files are never updated in place. Similarly for symlinks. + + I think even if we're using a different symlink mode we don't need + to worry. + Memory accounting At exit, show how much memory was used for the file list, etc. @@ -236,8 +256,6 @@ SIGHUP hang/timeout friendliness - On - verbose output Indicate whether files are new, updated, or deleted