TODO

   1 -*- indented-text -*-
   2
   3 URGENT ---------------------------------------------------------------
   4
   5
   6 IMPORTANT ------------------------------------------------------------
   7
   8 Cross-test versions
   9
  10   Part of the regression suite should be making sure that we don't
  11   break backwards compatibility: old clients vs new servers and so
  12   on.  Ideally we would test the cross product of versions.
  13
  14   It might be sufficient to test downloads from well-known public
  15   rsync servers running different versions of rsync.  This will give
  16   some testing and also be the most common case for having different
  17   versions and not being able to upgrade.
  18
  19 use chroot
  20
  21   If the platform doesn't support it, then don't even try.
  22
  23   If running as non-root, then don't fail, just give a warning.
  24   (There was a thread about this a while ago?)
  25
  26     http://lists.samba.org/pipermail/rsync/2001-August/thread.html
  27     http://lists.samba.org/pipermail/rsync/2001-September/thread.html
  28
  29 --files-from
  30
  31   Avoids traversal.  Better option than a pile of --include statements
  32   for people who want to generate the file list using a find(1)
  33   command or a script.
  34
  35
  36 Performance
  37
  38   Traverse just one directory at a time.  Tridge says it's possible.
  39
  40   At the moment rsync reads the whole file list into memory at the
  41   start, which makes us use a lot of memory and also not pipeline
  42   network access as much as we could.
  43
  44
  45 Handling duplicate names
  46
  47   We need to be careful of duplicate names getting into the file list.
  48   See clean_flist().  This could happen if multiple arguments include
  49   the same file.  Bad.
  50
  51   I think duplicates are only a problem if they're both flowing
  52   through the pipeline at the same time.  For example we might have
  53   updated the first occurrence after reading the checksums for the
  54   second.  So possibly we just need to make sure that we don't have
  55   both in the pipeline at the same time.
  56
  57   Possibly if we did one directory at a time that would be sufficient.
  58
  59   Alternatively we could pre-process the arguments to make sure no
  60   duplicates will ever be inserted.  There could be some bad cases
  61   when we're collapsing symlinks.
  62
  63   We could have a hash table.
  64
  65   The root of the problem is that we do not want more than one file
  66   list entry referring to the same file.  At first glance there are
  67   several ways this could happen: symlinks, hardlinks, and repeated
  68   names on the command line.
  69
  70   If names are repeated on the command line, they may be present in
  71   different forms, perhaps by traversing directory paths in different
  72   ways, traversing paths including symlinks.  Also we need to allow
  73   for expansion of globs by rsync.
  74
  75   At the moment, clean_flist() requires having the entire file list in
  76   memory.  Duplicate names are detected just by a string comparison.
  77
  78   We don't need to worry about hard links causing duplicates because
  79   files are never updated in place.  Similarly for symlinks.
  80
  81   I think even if we're using a different symlink mode we don't need
  82   to worry.
  83
  84   Unless we're really clever this will introduce a protocol
  85   incompatibility, so we need to be able to accept the old format as
  86   well.
  87
  88
  89 Memory accounting
  90
  91   At exit, show how much memory was used for the file list, etc.
  92
  93   Also we do a wierd exponential-growth allocation in flist.c.  I'm
  94   not sure this makes sense with modern mallocs.  At any rate it will
  95   make us allocate a huge amount of memory for large file lists.
  96
  97   We can try using the GNU/SVID/XPG mallinfo() function to get some
  98   heap statistics.
  99
 100
 101 Hard-link handling
 102
 103   At the moment hardlink handling is very expensive, so it's off by
 104   default.  It does not need to be so.
 105
 106   Since most of the solutions are rather intertwined with the file
 107   list it is probably better to fix that first, although fixing
 108   hardlinks is possibly simpler.
 109
 110   We can rule out hardlinked directories since they will probably
 111   screw us up in all kinds of ways.  They simply should not be used.
 112
 113   At the moment rsync only cares about hardlinks to regular files.  I
 114   guess you could also use them for sockets, devices and other beasts,
 115   but I have not seen them.
 116
 117   When trying to reproduce hard links, we only need to worry about
 118   files that have more than one name (nlinks>1 && !S_ISDIR).
 119
 120   The basic point of this is to discover alternate names that refer to
 121   the same file.  All operations, including creating the file and
 122   writing modifications to it need only to be done for the first name.
 123   For all later names, we just create the link and then leave it
 124   alone.
 125
 126   If hard links are to be preserved:
 127
 128     Before the generator/receiver fork, the list of files is received
 129     from the sender (recv_file_list), and a table for detecting hard
 130     links is built.
 131
 132     The generator looks for hard links within the file list and does
 133     not send checksums for them, though it does send other metadata.
 134
 135     The sender sends the device number and inode with file entries, so
 136     that files are uniquely identified.
 137
 138     The receiver goes through and creates hard links (do_hard_links)
 139     after all data has been written, but before directory permissions
 140     are set.
 141
 142   At the moment device and inum are sent as 4-byte integers, which
 143   will probably cause problems on large filesystems.  On Linux the
 144   kernel uses 64-bit ino_t's internally, and people will soon have
 145   filesystems big enough to use them.  We ought to follow NFS4 in
 146   using 64-bit device and inode identification, perhaps with a
 147   protocol version bump.
 148
 149   Once we've seen all the names for a particular file, we no longer
 150   need to think about it and we can deallocate the memory.
 151
 152   We can also have the case where there are links to a file that are
 153   not in the tree being transferred.  There's nothing we can do about
 154   that.  Because we rename the destination into place after writing,
 155   any hardlinks to the old file are always going to be orphaned.  In
 156   fact that is almost necessary because otherwise we'd get really
 157   confused if we were generating checksums for one name of a file and
 158   modifying another.
 159
 160   At the moment the code seems to make a whole second copy of the file
 161   list, which seems unnecessary.
 162
 163   We should have a test case that exercises hard links.  Since it
 164   might be hard to compare ./tls output where the inodes change we
 165   might need a little program to check whether several names refer to
 166   the same file.
 167
 168 IPv6
 169
 170   Implement suggestions from http://www.kame.net/newsletter/19980604/
 171   and ftp://ftp.iij.ad.jp/pub/RFC/rfc2553.txt
 172
 173   If a host has multiple addresses, then listen try to connect to all
 174   in order until we get through.  (getaddrinfo may return multiple
 175   addresses.)  This is kind of implemented already.
 176
 177   Possibly also when starting as a server we may need to listen on
 178   multiple passive addresses.  This might be a bit harder, because we
 179   may need to select on all of them.  Hm.
 180
 181   Define a syntax for IPv6 literal addresses.  Since they include
 182   colons, they tend to break most naming systems, including ours.
 183   Based on the HTTP IPv6 syntax, I think we should use
 184
 185      rsync://[::1]/foo/bar
 186      [::1]::bar
 187
 188   which should just take a small change to the parser code.
 189
 190 Errors
 191
 192   If we hang or get SIGINT, then explain where we were up to.  Perhaps
 193   have a static buffer that contains the current function name, or
 194   some kind of description of what we were trying to do.  This is a
 195   little easier on people than needing to run strace/truss.
 196
 197   "The dungeon collapses!  You are killed."  Rather than "unexpected
 198   eof" give a message that is more detailed if possible and also more
 199   helpful.
 200
 201 File attributes
 202
 203   Device major/minor numbers should be at least 32 bits each.  See
 204   http://lists.samba.org/pipermail/rsync/2001-November/005357.html
 205
 206   Transfer ACLs.  Need to think of a standard representation.
 207   Probably better not to even try to convert between NT and POSIX.
 208   Possibly can share some code with Samba.
 209
 210 Empty directories
 211
 212   With the current common --include '*/' --exclude '*' pattern, people
 213   can end up with many empty directories.  We might avoid this by
 214   lazily creating such directories.
 215
 216 zlib
 217
 218   Perhaps don't use our own zlib.  Will we actually be incompatible,
 219   or just be slightly less efficient?
 220
 221 logging
 222
 223   Perhaps flush stdout after each filename, so that people trying to
 224   monitor progress in a log file can do so more easily.  See
 225   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
 226
 227   At the connections that just get a list of modules are not logged,
 228   but they should be.
 229
 230 rsyncd over ssh
 231
 232   There are already some patches to do this.
 233
 234 proxy authentication
 235
 236   Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do
 237   HTTP Basic Proxy-Authentication.
 238
 239   Multiple schemes are possible, up to and including the insanity that
 240   is NTLM, but Basic probably covers most cases.
 241
 242 SOCKS
 243
 244   Add --with-socks, and then perhaps a command-line option to put them
 245   on or off.  This might be more reliable than LD_PRELOAD hacks.
 246
 247 PLATFORMS ------------------------------------------------------------
 248
 249 Win32
 250
 251   Don't detach, because this messes up --srvany.
 252
 253   http://sources.redhat.com/ml/cygwin/2001-08/msg00234.html
 254
 255   According to "Effective TCP/IP Programming" (??) close() on a socket
 256   has incorrect behaviour on Windows -- it sends a RST packet to the
 257   other side, which gives a "connection reset by peer" error.  On that
 258   platform we should probably do shutdown() instead.  However, on Unix
 259   we are correct to call close(), because shutdown() discards
 260   untransmitted data.
 261
 262 DOCUMENTATION --------------------------------------------------------
 263
 264 Update README
 265
 266 BUILD FARM -----------------------------------------------------------
 267
 268 Add machines
 269
 270   AMDAHL UTS (Dave Dykstra)
 271
 272   Cygwin (on different versions of Win32?)
 273
 274   HP-UX variants (via HP?)
 275
 276   SCO
 277
 278 NICE -----------------------------------------------------------------
 279
 280 --no-detach and --no-fork options
 281
 282   Very useful for debugging.  Also good when running under a
 283   daemon-monitoring process that tries to restart the service when the
 284   parent exits.
 285
 286 hang/timeout friendliness
 287
 288 verbose output
 289
 290   Indicate whether files are new, updated, or deleted
 291
 292 internationalization
 293
 294   Change to using gettext().  Probably need to ship this for platforms
 295   that don't have it.
 296
 297   Solicit translations.
 298
 299   Does anyone care?
 300
 301 rsyncsh
 302
 303    Write a small emulation of interactive ftp as a Pythonn program
 304    that calls rsync.  Commands such as "cd", "ls", "ls *.c" etc map
 305    fairly directly into rsync commands: it just needs to remember the
 306    current host, directory and so on.  We can probably even do
 307    completion of remote filenames.
 308
 309 %K%