From: Martin Pool Date: Fri, 11 Jan 2002 07:05:30 +0000 (+0000) Subject: Merge ChangeSet@1.4: Documentation about flist scalability X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/commitdiff_plain/b3e6c8156529f78b097820ff964bff3e14753286 Merge ChangeSet@1.4: Documentation about flist scalability --- diff --git a/TODO b/TODO index 75d4e56a..cb187126 100644 --- a/TODO +++ b/TODO @@ -40,10 +40,31 @@ Performance start, which makes us use a lot of memory and also not pipeline network access as much as we could. + We need to be careful of duplicate names getting into the file list. + See clean_flist. This could happen if multiple arguments include + the same file. Bad. + + I think duplicates are only a problem if they're both flowing + through the pipeline at the same time. For example we might have + updated the first occurrence after reading the checksums for the + second. So possibly we just need to make sure that we don't have + both in the pipeline at the same time. + + Possibly if we did one directory at a time that would be sufficient. + + Alternatively we could pre-process the arguments to make sure no + duplicates will ever be inserted. + + We could have a hash table. + Memory accounting At exit, show how much memory was used for the file list, etc. + Also we do a wierd exponential-growth allocation in flist.c. I'm + not sure this makes sense with modern mallocs. At any rate it will + make us allocate a huge amount of memory for large file lists. + Hard-link handling At the moment hardlink handling is very expensive, so it's off by @@ -238,3 +259,4 @@ rsyncsh current host, directory and so on. We can probably even do completion of remote filenames. +%K%