From: Martin Pool <mbp@samba.org>
Date: Fri, 11 Jan 2002 07:05:30 +0000 (+0000)
Subject: Merge ChangeSet@1.4: Documentation about flist scalability
X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/commitdiff_plain/b3e6c8156529f78b097820ff964bff3e14753286

Merge ChangeSet@1.4: Documentation about flist scalability
---

diff --git a/TODO b/TODO
index 75d4e56a..cb187126 100644
--- a/TODO
+++ b/TODO
@@ -40,10 +40,31 @@ Performance
   start, which makes us use a lot of memory and also not pipeline
   network access as much as we could.
 
+  We need to be careful of duplicate names getting into the file list.
+  See clean_flist.  This could happen if multiple arguments include
+  the same file.  Bad.  
+
+  I think duplicates are only a problem if they're both flowing
+  through the pipeline at the same time.  For example we might have
+  updated the first occurrence after reading the checksums for the
+  second.  So possibly we just need to make sure that we don't have
+  both in the pipeline at the same time.  
+
+  Possibly if we did one directory at a time that would be sufficient.
+
+  Alternatively we could pre-process the arguments to make sure no
+  duplicates will ever be inserted.  
+
+  We could have a hash table.
+
 Memory accounting
 
   At exit, show how much memory was used for the file list, etc.
 
+  Also we do a wierd exponential-growth allocation in flist.c.  I'm
+  not sure this makes sense with modern mallocs.  At any rate it will
+  make us allocate a huge amount of memory for large file lists.
+
 Hard-link handling
 
   At the moment hardlink handling is very expensive, so it's off by
@@ -238,3 +259,4 @@ rsyncsh
    current host, directory and so on.  We can probably even do
    completion of remote filenames.
 
+%K%