Added keword base reporting to TODO features list.
[rsync/rsync.git] / TODO
CommitLineData
46ef7d1d 1-*- indented-text -*-
a0365806 2
259c3e72 3BUGS ---------------------------------------------------------------
abb0b532
S
4Fix hardlink reporting 2002/03/25
5Fix progress indicator to not corrupt log
6lchmod question
7Do not rely on having a group called "nobody"
8Incorrect timestamps (Debian #100295)
9Win32
10
11FEATURES ------------------------------------------------------------
12server-imposed bandwidth limits
13rsyncd over ssh
14Use chroot only if supported
15Allow supplementary groups in rsyncd.conf 2002/04/09
16Handling IPv6 on old machines
17Other IPv6 stuff:
18Add ACL support 2001/12/02
19Lazy directory creation
20Conditional -z for old protocols
21proxy authentication 2002/01/23
22SOCKS 2002/01/23
23FAT support
24Allow forcing arbitrary permissions 2002/03/12
25--diff david.e.sewell 2002/03/15
26Add daemon --no-detach and --no-fork options
16a3fec0 27Create more granular verbosity jw 2003/05/15
abb0b532
S
28
29DOCUMENTATION --------------------------------------------------------
30Update README
31Keep list of open issues and todos on the web site
32Update web site from CVS
33Perhaps redo manual as SGML
34
35LOGGING --------------------------------------------------------------
36Make dry run list all updates 2002/04/03
37Memory accounting
38Improve error messages
39Better statistics: Rasmus 2002/03/08
40Perhaps flush stdout like syslog
41Log deamon sessions that just list modules
42Log child death on signal
43Keep stderr and stdout properly separated (Debian #23626)
44Log errors with function that reports process of origin
45verbose output David Stein 2001/12/20
46Add reason for transfer to file logging
47debugging of daemon 2002/04/08
48internationalization
49
50DEVELOPMENT --------------------------------------------------------
51Handling duplicate names
52Use generic zlib 2002/02/25
53TDB: 2002/03/12
54Splint 2002/03/12
55Memory debugger
56Create release script
57Add machines to build farm
58
59PERFORMANCE ----------------------------------------------------------
60File list structure in memory
61Traverse just one directory at a time
62Hard-link handling
63Allow skipping MD4 file_sum 2002/04/08
64Accelerate MD4
65String area code
66
67TESTING --------------------------------------------------------------
68Torture test
69Cross-test versions 2001/08/22
70Test on kernel source
71Test large files
72Create mutator program for testing
73Create configure option to enable dangerous tests
74If tests are skipped, say why.
75Test daemon feature to disallow particular options.
76Create pipe program for testing
77Create test makefile target for some tests
78Test "refuse options" works
79
80RELATED PROJECTS -----------------------------------------------------
81rsyncsh
82http://rsync.samba.org/rsync-and-debian/
83rsyncable gzip patch
84rsyncsplit as alternative to real integration with gzip?
85reverse rsync over HTTP Range
86
259c3e72 87
abb0b532
S
88
89BUGS ---------------------------------------------------------------
90
91Fix hardlink reporting 2002/03/25
92 (was: There seems to be a bug with hardlinks)
259c3e72
MP
93
94 mbp/2 build$ ls -l /tmp/a /tmp/b -i
95 /tmp/a:
96 total 32
97 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
98 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
99 2568307 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
100 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
101 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
102 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
103 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
104 2568310 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
105
106 /tmp/b:
107 total 32
108 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
109 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
110 2568309 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
111 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
112 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
113 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
114 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
115 2568311 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
116 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
117 building file list ... done
118 created directory /tmp/b
119 ./
120 a1
121 a4
122 a2 => a1
123 a3 => a2
124 wrote 350 bytes read 52 bytes 804.00 bytes/sec
125 total size is 232 speedup is 0.58
126 mbp/2 build$ rm -r /tmp/b
127 mbp/2 build$ ls -l /tmp/b
128 ls: /tmp/b: No such file or directory
129 mbp/2 build$ rm -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
130 rm: cannot remove `/tmp/b': No such file or directory
131 mbp/2 build$ rm -f -r /tmp/b && ./rsync -avH /tmp/a/ /tmp/b
132 building file list ... done
133 created directory /tmp/b
134 ./
135 a1
136 a4
137 a2 => a1
138 a3 => a2
139 wrote 350 bytes read 52 bytes 804.00 bytes/sec
140 total size is 232 speedup is 0.58
141 mbp/2 build$ ls -l /tmp/b
142 total 32
143 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
144 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
145 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
146 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
147 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
148 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
149 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
150 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
151 mbp/2 build$ ls -l /tmp/a
152 total 32
153 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a1
154 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a2
155 -rw-rw-r-- 3 mbp mbp 29 Mar 25 17:30 a3
156 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a4
157 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 a5
158 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b1
159 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b2
160 -rw-rw-r-- 5 mbp mbp 29 Mar 25 17:30 b3
46ef7d1d 161
abb0b532
S
162 -- --
163
33d213bb 164
abb0b532
S
165Fix progress indicator to not corrupt log
166
167 Progress indicator can produce corrupt output when transferring directories:
e4724e5c
MP
168
169 main/binary-arm/
170 main/binary-arm/admin/
171 main/binary-arm/base/
172 main/binary-arm/comm/8.56kB/s 0:00:52
173 main/binary-arm/devel/
174 main/binary-arm/doc/
175 main/binary-arm/editors/
176 main/binary-arm/electronics/s 0:00:53
177 main/binary-arm/games/
178 main/binary-arm/graphics/
179 main/binary-arm/hamradio/
180 main/binary-arm/interpreters/
181 main/binary-arm/libs/6.61kB/s 0:00:54
182 main/binary-arm/mail/
183 main/binary-arm/math/
184 main/binary-arm/misc/
185
abb0b532
S
186 -- --
187
188
189lchmod question
7e28fca1 190
e4724e5c 191 I don't think we handle this properly on systems that don't have the
7e28fca1
MP
192 call. Are there any such?
193
abb0b532 194 -- --
e4724e5c 195
5ba268ef 196
8bd1a73e
MP
197Do not rely on having a group called "nobody"
198
199 http://www.linuxbase.org/spec/refspecs/LSB_1.1.0/gLSB/usernames.html
200
201 On Debian it's "nogroup"
e4724e5c 202
abb0b532 203 -- --
b3e6c815 204
d2e9d069 205
abb0b532 206Incorrect timestamps (Debian #100295)
d2e9d069 207
abb0b532 208 A bit hard to believe, but apparently it happens.
d2e9d069 209
abb0b532 210 -- --
d2e9d069 211
d2e9d069 212
abb0b532 213Win32
0e5a1f83 214
abb0b532 215 Don't detach, because this messes up --srvany.
0e5a1f83 216
abb0b532 217 http://sources.redhat.com/ml/cygwin/2001-08/msg00234.html
a6a3c3df 218
a6a3c3df 219
b3e6c815 220
abb0b532 221 -- --
0e5a1f83 222
abb0b532 223FEATURES ------------------------------------------------------------
a6a3c3df 224
abb0b532 225server-imposed bandwidth limits
a6a3c3df 226
abb0b532 227 -- --
0e5a1f83 228
a6a3c3df 229
abb0b532 230rsyncd over ssh
a6a3c3df 231
abb0b532 232 There are already some patches to do this.
a6a3c3df 233
abb0b532
S
234 BitKeeper uses a server whose login shell is set to bkd. That's
235 probably a reasonable approach.
a6a3c3df 236
abb0b532 237 -- --
a6a3c3df 238
a6a3c3df 239
abb0b532 240Use chroot only if supported
a6a3c3df 241
abb0b532 242 If the platform doesn't support it, then don't even try.
a6a3c3df 243
abb0b532
S
244 If running as non-root, then don't fail, just give a warning.
245 (There was a thread about this a while ago?)
a6a3c3df 246
abb0b532
S
247 http://lists.samba.org/pipermail/rsync/2001-August/thread.html
248 http://lists.samba.org/pipermail/rsync/2001-September/thread.html
a6a3c3df 249
abb0b532 250 -- --
a6a3c3df 251
a6a3c3df 252
abb0b532 253Allow supplementary groups in rsyncd.conf 2002/04/09
a6a3c3df 254
abb0b532
S
255 Perhaps allow supplementary groups to be specified in rsyncd.conf;
256 then make the first one the primary gid and all the rest be
257 supplementary gids.
a2d2e5c0 258
abb0b532 259 -- --
a2d2e5c0 260
bde47ca7 261
411acbbc 262Handling IPv6 on old machines
bde47ca7 263
411acbbc
MP
264 The KAME IPv6 patch is nice in theory but has proved a bit of a
265 nightmare in practice. The basic idea of their patch is that rsync
266 is rewritten to use the new getaddrinfo()/getnameinfo() interface,
267 rather than gethostbyname()/gethostbyaddr() as in rsync 2.4.6.
268 Systems that don't have the new interface are handled by providing
269 our own implementation in lib/, which is selectively linked in.
c7d692c3 270
411acbbc
MP
271 The problem with this is that it is really hard to get right on
272 platforms that have a half-working implementation, so redefining
273 these functions clashes with system headers, and leaving them out
274 breaks. This affects at least OSF/1, RedHat 5, and Cobalt, which
275 are moderately improtant.
276
277 Perhaps the simplest solution would be to have two different files
278 implementing the same interface, and choose either the new or the
279 old API. This is probably necessary for systems that e.g. have
280 IPv6, but gethostbyaddr() can't handle it. The Linux manpage claims
281 this is currently the case.
282
283 In fact, our internal sockets interface (things like
284 open_socket_out(), etc) is much narrower than the getaddrinfo()
285 interface, and so probably simpler to get right. In addition, the
286 old code is known to work well on old machines.
287
288 We could drop the rather large lib/getaddrinfo files.
289
abb0b532
S
290 -- --
291
411acbbc
MP
292
293Other IPv6 stuff:
294
c33e3e39
MP
295 Implement suggestions from http://www.kame.net/newsletter/19980604/
296 and ftp://ftp.iij.ad.jp/pub/RFC/rfc2553.txt
297
298 If a host has multiple addresses, then listen try to connect to all
299 in order until we get through. (getaddrinfo may return multiple
c10b0bdd 300 addresses.) This is kind of implemented already.
c33e3e39
MP
301
302 Possibly also when starting as a server we may need to listen on
303 multiple passive addresses. This might be a bit harder, because we
304 may need to select on all of them. Hm.
305
a2d2e5c0
MP
306 Define a syntax for IPv6 literal addresses. Since they include
307 colons, they tend to break most naming systems, including ours.
308 Based on the HTTP IPv6 syntax, I think we should use
309
a577af90 310 rsync://[::1]/foo/bar [::1]::bar
a2d2e5c0
MP
311
312 which should just take a small change to the parser code.
313
abb0b532 314 -- --
b17dd0c4
MP
315
316
abb0b532 317Add ACL support 2001/12/02
5575de14 318
5575de14
MP
319 Transfer ACLs. Need to think of a standard representation.
320 Probably better not to even try to convert between NT and POSIX.
321 Possibly can share some code with Samba.
5aafd07b 322
abb0b532
S
323 -- --
324
325
326Lazy directory creation
28a69e25
MP
327
328 With the current common --include '*/' --exclude '*' pattern, people
329 can end up with many empty directories. We might avoid this by
330 lazily creating such directories.
331
abb0b532 332 -- --
c6e27b60 333
28a69e25 334
abb0b532 335Conditional -z for old protocols
c6e27b60 336
abb0b532
S
337 After we get the @RSYNCD greeting from the server, we know it's
338 version but we have not yet sent the command line, so we could just
339 remove the -z option if the server is too old.
c6e27b60 340
abb0b532
S
341 For ssh invocation it's not so simple, because we actually use the
342 command line to start the remote process. However, we only actually
343 do compression in token.c, and we could therefore once we discover
344 the remote version emit an error if it's too old. I'm not sure if
345 that's a good tradeoff or not.
c6e27b60 346
abb0b532 347 -- --
5ba268ef 348
5ba268ef 349
abb0b532 350proxy authentication 2002/01/23
92325ada
MP
351
352 Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do
a577af90 353 HTTP Basic Proxy-Authentication.
92325ada
MP
354
355 Multiple schemes are possible, up to and including the insanity that
356 is NTLM, but Basic probably covers most cases.
357
abb0b532
S
358 -- --
359
360
361SOCKS 2002/01/23
92325ada
MP
362
363 Add --with-socks, and then perhaps a command-line option to put them
364 on or off. This might be more reliable than LD_PRELOAD hacks.
365
abb0b532
S
366 -- --
367
368
5ba268ef
MP
369FAT support
370
a577af90
PG
371 rsync to a FAT partition on a Unix machine doesn't work very well at
372 the moment. I think we get errors about invalid filenames and
5ba268ef
MP
373 perhaps also trying to do atomic renames.
374
a577af90
PG
375 I guess the code to do this is currently #ifdef'd on Windows;
376 perhaps we ought to intelligently fall back to it on Unix too.
5ba268ef 377
abb0b532 378 -- --
5ba268ef 379
27741d9f 380
abb0b532 381Allow forcing arbitrary permissions 2002/03/12
e53fe9a2 382
abb0b532
S
383 On 12 Mar 2002, Dave Dykstra <dwd@bell-labs.com> wrote:
384 > If we would add an option to do that functionality, I
385 > would vote for one that was more general which could mask
386 > off any set of permission bits and possibly add any set of
387 > bits. Perhaps a chmod-like syntax if it could be
388 > implemented simply.
97e1254a 389
a577af90 390 I think that would be good too. For example, people uploading files
97e1254a
MP
391 to a web server might like to say
392
393 rsync -avzP --chmod a+rX ./ sourcefrog.net:/home/www/sourcefrog/
394
395 Ideally the patch would implement as many of the gnu chmod semantics
396 as possible. I think the mode parser should be a separate function
a577af90
PG
397 that passes back something like (mask,set) description to the rest
398 of the program. For bonus points there would be a test case for the
97e1254a
MP
399 parser.
400
8bd1a73e
MP
401 Possibly also --chown
402
36692011
MP
403 (Debian #23628)
404
abb0b532 405 -- --
97e1254a 406
abb0b532
S
407
408--diff david.e.sewell 2002/03/15
3c1edccb
MP
409
410 Allow people to specify the diff command. (Might want to use wdiff,
411 gnudiff, etc.)
412
413 Just diff the temporary file with the destination file, and delete
414 the tmp file rather than moving it into place.
415
416 Interaction with --partial.
417
418 Security interactions with daemon mode?
419
abb0b532 420 -- --
3c1edccb
MP
421
422
abb0b532 423Add daemon --no-detach and --no-fork options
a628b069 424
abb0b532
S
425 Very useful for debugging. Also good when running under a
426 daemon-monitoring process that tries to restart the service when the
427 parent exits.
6d19c674 428
abb0b532 429 -- --
6d19c674 430
16a3fec0
S
431
432Create more granular verbosity jw 2003/05/15
433
434 Control output with the --report option.
435
436 The option takes as a single argument (no whitespace) a
437 comma delimited lists of keywords.
438
439 This would separate debugging from "logging" as well as
440 fine grained selection of statistical reporting and what
441 actions are logged.
442
443 http://lists.samba.org/archive/rsync/2003-May/006059.html
444
445 -- --
446
abb0b532 447DOCUMENTATION --------------------------------------------------------
6d19c674 448
abb0b532 449Update README
6d19c674 450
abb0b532 451 -- --
6479c2ed
MP
452
453
abb0b532 454Keep list of open issues and todos on the web site
bd0ad74f 455
abb0b532 456 -- --
bd0ad74f 457
bd0ad74f 458
abb0b532 459Update web site from CVS
bd0ad74f 460
abb0b532 461 -- --
bd0ad74f 462
bd0ad74f 463
abb0b532 464Perhaps redo manual as SGML
bd0ad74f 465
abb0b532
S
466 The man page is getting rather large, and there is more information
467 that ought to be added.
bd0ad74f 468
abb0b532 469 TexInfo source is probably a dying format.
6479c2ed 470
abb0b532
S
471 Linuxdoc looks like the most likely contender. I know DocBook is
472 favoured by some people, but it's so bloody verbose, even with emacs
473 support.
a628b069 474
abb0b532 475 -- --
a2d2e5c0 476
abb0b532 477LOGGING --------------------------------------------------------------
a2d2e5c0 478
abb0b532
S
479Make dry run list all updates 2002/04/03
480
481 --dry-run is too dry
482
483 Mark Santcroos points out that -n fails to list files which have
484 only metadata changes, though it probably should.
485
486 There may be a Debian bug about this as well.
487
488 -- --
489
490
491Memory accounting
492
493 At exit, show how much memory was used for the file list, etc.
494
495 Also we do a wierd exponential-growth allocation in flist.c. I'm
496 not sure this makes sense with modern mallocs. At any rate it will
497 make us allocate a huge amount of memory for large file lists.
498
499 -- --
500
501
502Improve error messages
503
504 If we hang or get SIGINT, then explain where we were up to. Perhaps
505 have a static buffer that contains the current function name, or
506 some kind of description of what we were trying to do. This is a
507 little easier on people than needing to run strace/truss.
508
509 "The dungeon collapses! You are killed." Rather than "unexpected
510 eof" give a message that is more detailed if possible and also more
511 helpful.
512
513 If we get an error writing to a socket, then we should perhaps
514 continue trying to read to see if an error message comes across
515 explaining why the socket is closed. I'm not sure if this would
516 work, but it would certainly make our messages more helpful.
517
518 What happens if a directory is missing -x attributes. Do we lose
519 our load? (Debian #28416) Probably fixed now, but a test case would
520 be good.
a2d2e5c0 521
a2d2e5c0 522
5ba268ef 523
abb0b532 524 -- --
0e23e41d 525
abb0b532
S
526
527Better statistics: Rasmus 2002/03/08
528
529 <Rasmus>
530 hey, how about an rsync option that just gives you the
531 summary without the list of files? And perhaps gives
532 more information like the number of new files, number
533 of changed, deleted, etc. ?
534
535 <mbp>
536 nice idea there is --stats but at the moment it's very
537 tridge-oriented rather than user-friendly it would be
538 nice to improve it that would also work well with
539 --dryrun
540
541 -- --
542
543
544Perhaps flush stdout like syslog
545
546 Perhaps flush stdout after each filename, so that people trying to
547 monitor progress in a log file can do so more easily. See
548 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
549
550 -- --
551
552
553Log deamon sessions that just list modules
554
555 At the connections that just get a list of modules are not logged,
556 but they should be.
557
558 -- --
559
560
561Log child death on signal
562
563 If a child of the rsync daemon dies with a signal, we should notice
564 that when we reap it and log a message.
565
566 -- --
567
568
569Keep stderr and stdout properly separated (Debian #23626)
570
571 -- --
572
573
574Log errors with function that reports process of origin
575
576 Use a separate function for reporting errors; prefix it with
577 "rsync:" or "rsync(remote)", or perhaps even "rsync(local
578 generator): ".
579
580 -- --
581
582
583verbose output David Stein 2001/12/20
584
585 Indicate whether files are new, updated, or deleted
586
587 At end of transfer, show how many files were or were not transferred
588 correctly.
589
590 -- --
591
592
593Add reason for transfer to file logging
594
595 Explain *why* every file is transferred or not (e.g. "local mtime
596 123123 newer than 1283198")
597
598 -- --
599
600
601debugging of daemon 2002/04/08
602
603 Add an rsyncd.conf parameter to turn on debugging on the server.
604
605 -- --
606
607
608internationalization
609
610 Change to using gettext(). Probably need to ship this for platforms
611 that don't have it.
612
613 Solicit translations.
614
615 Does anyone care? Before we bother modifying the code, we ought to
616 get the manual translated first, because that's possibly more useful
617 and at any rate demonstrates desire.
618
619 -- --
620
621DEVELOPMENT --------------------------------------------------------
622
623Handling duplicate names
624
625 We need to be careful of duplicate names getting into the file list.
626 See clean_flist(). This could happen if multiple arguments include
627 the same file. Bad.
628
629 I think duplicates are only a problem if they're both flowing
630 through the pipeline at the same time. For example we might have
631 updated the first occurrence after reading the checksums for the
632 second. So possibly we just need to make sure that we don't have
633 both in the pipeline at the same time.
634
635 Possibly if we did one directory at a time that would be sufficient.
636
637 Alternatively we could pre-process the arguments to make sure no
638 duplicates will ever be inserted. There could be some bad cases
639 when we're collapsing symlinks.
640
641 We could have a hash table.
642
643 The root of the problem is that we do not want more than one file
644 list entry referring to the same file. At first glance there are
645 several ways this could happen: symlinks, hardlinks, and repeated
646 names on the command line.
647
648 If names are repeated on the command line, they may be present in
649 different forms, perhaps by traversing directory paths in different
650 ways, traversing paths including symlinks. Also we need to allow
651 for expansion of globs by rsync.
652
653 At the moment, clean_flist() requires having the entire file list in
654 memory. Duplicate names are detected just by a string comparison.
655
656 We don't need to worry about hard links causing duplicates because
657 files are never updated in place. Similarly for symlinks.
658
659 I think even if we're using a different symlink mode we don't need
660 to worry.
661
662 Unless we're really clever this will introduce a protocol
663 incompatibility, so we need to be able to accept the old format as
664 well.
665
666 -- --
667
668
669Use generic zlib 2002/02/25
670
671 Perhaps don't use our own zlib.
672
673 Advantages:
674
675 - will automatically be up to date with bugfixes in zlib
676
677 - can leave it out for small rsync on e.g. recovery disks
678
679 - can use a shared library
680
681 - avoids people breaking rsync by trying to do this themselves and
682 messing up
683
684 Should we ship zlib for systems that don't have it, or require
685 people to install it separately?
686
687 Apparently this will make us incompatible with versions of rsync
688 that use the patched version of rsync. Probably the simplest way to
689 do this is to just disable gzip (with a warning) when talking to old
690 versions.
691
692 -- --
693
694
695TDB: 2002/03/12
696
697 Rather than storing the file list in memory, store it in a TDB.
698
699 This *might* make memory usage lower while building the file list.
700
701 Hashtable lookup will mean files are not transmitted in order,
702 though... hm.
703
704 This would neatly eliminate one of the major post-fork shared data
705 structures.
706
707 -- --
708
709
710Splint 2002/03/12
0e23e41d
MP
711
712 Build rsync with SPLINT to try to find security holes. Add
713 annotations as necessary. Keep track of the number of warnings
714 found initially, and see how many of them are real bugs, or real
715 security bugs. Knowing the percentage of likely hits would be
716 really interesting for other projects.
717
abb0b532 718 -- --
f5a95bb5 719
f5a95bb5 720
43a4dc10
MP
721Memory debugger
722
3a79260d 723 jra recommends Valgrind:
43a4dc10
MP
724
725 http://devel-home.kde.org/~sewardj/
726
abb0b532
S
727 -- --
728
729
730Create release script
25ff30e8 731
abb0b532 732 Script would:
25ff30e8 733
abb0b532 734 Update spec files
25ff30e8 735
abb0b532
S
736 Build tar file; upload
737
738 Send announcement to mailing list and c.o.l.a.
25ff30e8 739
abb0b532 740 Make freshmeat announcement
25ff30e8 741
abb0b532 742 Update web site
25ff30e8 743
abb0b532 744 -- --
25ff30e8
MP
745
746
abb0b532 747Add machines to build farm
e9c4c301 748
abb0b532 749 Cygwin (on different versions of Win32?)
e9c4c301 750
abb0b532 751 HP-UX variants (via HP?)
25ff30e8 752
abb0b532 753 SCO
e9c4c301 754
e9c4c301 755
25ff30e8 756
abb0b532 757 -- --
25ff30e8 758
abb0b532 759PERFORMANCE ----------------------------------------------------------
25ff30e8 760
abb0b532 761File list structure in memory
25ff30e8 762
abb0b532
S
763 Rather than one big array, perhaps have a tree in memory mirroring
764 the directory tree.
25ff30e8 765
abb0b532
S
766 This might make sorting much faster! (I'm not sure it's a big CPU
767 problem, mind you.)
25ff30e8 768
abb0b532
S
769 It might also reduce memory use in storing repeated directory names
770 -- again I'm not sure this is a problem.
e9c4c301 771
abb0b532 772 -- --
e9c4c301 773
e9c4c301 774
abb0b532 775Traverse just one directory at a time
e9c4c301 776
abb0b532 777 Traverse just one directory at a time. Tridge says it's possible.
e9c4c301 778
abb0b532
S
779 At the moment rsync reads the whole file list into memory at the
780 start, which makes us use a lot of memory and also not pipeline
781 network access as much as we could.
e9c4c301 782
abb0b532 783 -- --
b73b51a9 784
599dc93c 785
abb0b532 786Hard-link handling
76533c52 787
abb0b532
S
788 At the moment hardlink handling is very expensive, so it's off by
789 default. It does not need to be so.
6479c2ed 790
abb0b532
S
791 Since most of the solutions are rather intertwined with the file
792 list it is probably better to fix that first, although fixing
793 hardlinks is possibly simpler.
717eb9b8 794
abb0b532
S
795 We can rule out hardlinked directories since they will probably
796 screw us up in all kinds of ways. They simply should not be used.
717eb9b8 797
abb0b532
S
798 At the moment rsync only cares about hardlinks to regular files. I
799 guess you could also use them for sockets, devices and other beasts,
800 but I have not seen them.
717eb9b8 801
abb0b532
S
802 When trying to reproduce hard links, we only need to worry about
803 files that have more than one name (nlinks>1 && !S_ISDIR).
e9c4c301 804
abb0b532
S
805 The basic point of this is to discover alternate names that refer to
806 the same file. All operations, including creating the file and
807 writing modifications to it need only to be done for the first name.
808 For all later names, we just create the link and then leave it
809 alone.
7c583c73 810
abb0b532 811 If hard links are to be preserved:
7c583c73 812
abb0b532
S
813 Before the generator/receiver fork, the list of files is received
814 from the sender (recv_file_list), and a table for detecting hard
815 links is built.
b73b51a9 816
abb0b532
S
817 The generator looks for hard links within the file list and does
818 not send checksums for them, though it does send other metadata.
b73b51a9 819
abb0b532
S
820 The sender sends the device number and inode with file entries, so
821 that files are uniquely identified.
5af50297 822
abb0b532
S
823 The receiver goes through and creates hard links (do_hard_links)
824 after all data has been written, but before directory permissions
825 are set.
5af50297 826
abb0b532
S
827 At the moment device and inum are sent as 4-byte integers, which
828 will probably cause problems on large filesystems. On Linux the
829 kernel uses 64-bit ino_t's internally, and people will soon have
830 filesystems big enough to use them. We ought to follow NFS4 in
831 using 64-bit device and inode identification, perhaps with a
832 protocol version bump.
5af50297 833
abb0b532
S
834 Once we've seen all the names for a particular file, we no longer
835 need to think about it and we can deallocate the memory.
5af50297 836
abb0b532
S
837 We can also have the case where there are links to a file that are
838 not in the tree being transferred. There's nothing we can do about
839 that. Because we rename the destination into place after writing,
840 any hardlinks to the old file are always going to be orphaned. In
841 fact that is almost necessary because otherwise we'd get really
842 confused if we were generating checksums for one name of a file and
843 modifying another.
5af50297 844
abb0b532
S
845 At the moment the code seems to make a whole second copy of the file
846 list, which seems unnecessary.
5af50297 847
abb0b532
S
848 We should have a test case that exercises hard links. Since it
849 might be hard to compare ./tls output where the inodes change we
850 might need a little program to check whether several names refer to
851 the same file.
a2d2e5c0 852
abb0b532 853 -- --
a2d2e5c0 854
a2d2e5c0 855
abb0b532 856Allow skipping MD4 file_sum 2002/04/08
33d213bb 857
abb0b532
S
858 If we're doing a local transfer, or using -W, then perhaps don't
859 send the file checksum. If we're doing a local transfer, then
860 calculating MD4 checksums uses 90% of CPU and is unlikely to be
861 useful.
5aafd07b 862
abb0b532
S
863 Indeed for transfers over zlib or ssh we can also rely on the
864 transport to have quite strong protection against corruption.
46ef7d1d 865
abb0b532
S
866 Perhaps we should have an option to disable this,
867 analogous to --whole-file, although it would default to
868 disabled. The file checksum takes up a definite space in
869 the protocol -- we can either set it to 0, or perhaps just
870 leave it out.
a2d2e5c0 871
abb0b532 872 -- --
a2d2e5c0 873
62b68c80 874
abb0b532 875Accelerate MD4
62b68c80 876
abb0b532 877 Perhaps borrow an assembler MD4 from someone?
62b68c80 878
abb0b532
S
879 Make sure we call MD4 with properly-sized blocks whenever possible
880 to avoid copying into the residue region?
a2d2e5c0 881
abb0b532 882 -- --
50f2f002 883
d834adc1 884
abb0b532 885String area code
62b68c80 886
abb0b532
S
887 Test whether this is actually faster than just using malloc(). If
888 it's not (anymore), throw it out.
62b68c80 889
abb0b532 890 -- --
62b68c80 891
abb0b532 892TESTING --------------------------------------------------------------
8ff9d697 893
abb0b532 894Torture test
8ff9d697 895
abb0b532
S
896 Something that just keeps running rsync continuously over a data set
897 likely to generate problems.
8ff9d697 898
abb0b532 899 -- --
62b68c80 900
62b68c80 901
abb0b532 902Cross-test versions 2001/08/22
62b68c80 903
abb0b532
S
904 Part of the regression suite should be making sure that we
905 don't break backwards compatibility: old clients vs new
906 servers and so on. Ideally we would test both up and down
907 from the current release to all old versions.
62b68c80 908
abb0b532 909 Run current rsync versions against significant past releases.
3d90ec14 910
abb0b532
S
911 We might need to omit broken old versions, or versions in which
912 particular functionality is broken
a2d2e5c0 913
abb0b532
S
914 It might be sufficient to test downloads from well-known public
915 rsync servers running different versions of rsync. This will give
916 some testing and also be the most common case for having different
917 versions and not being able to upgrade.
a2d2e5c0 918
abb0b532 919 The new --protocol option may help in this.
a2d2e5c0 920
abb0b532
S
921 -- --
922
923
924Test on kernel source
925
926 Download all versions of kernel; unpack, sync between them. Also
927 sync between uncompressed tarballs. Compare directories after
928 transfer.
929
930 Use local mode; ssh; daemon; --whole-file and --no-whole-file.
931
932 Use awk to pull out the 'speedup' number for each transfer. Make
933 sure it is >= x.
934
935 -- --
936
937
938Test large files
939
940 Sparse and non-sparse
941
942 -- --
943
944
945Create mutator program for testing
946
947 Insert bytes, delete bytes, swap blocks, ...
948
949 -- --
950
951
952Create configure option to enable dangerous tests
953
954 -- --
955
956
957If tests are skipped, say why.
958
959 -- --
960
961
962Test daemon feature to disallow particular options.
963
964 -- --
965
966
967Create pipe program for testing
968
969 Create pipe program that makes slow/jerky connections for
970 testing Versions of read() and write() that corrupt the
971 stream, or abruptly fail
972
973 -- --
974
975
976Create test makefile target for some tests
977
978 Separate makefile target to run rough tests -- or perhaps
979 just run them every time?
980
981 -- --
982
983
984Test "refuse options" works
985
986 What about for --recursive?
987
988 If you specify an unrecognized option here, you should get an error.
989
990 We need a test case for this...
991
992 Was this broken when we changed to popt?
993
994 -- --
995
996RELATED PROJECTS -----------------------------------------------------
3d90ec14 997
a577af90 998rsyncsh
46ef7d1d
MP
999
1000 Write a small emulation of interactive ftp as a Pythonn program
1001 that calls rsync. Commands such as "cd", "ls", "ls *.c" etc map
1002 fairly directly into rsync commands: it just needs to remember the
1003 current host, directory and so on. We can probably even do
1004 completion of remote filenames.
25ff30e8 1005
abb0b532 1006 -- --
25ff30e8 1007
25ff30e8
MP
1008
1009http://rsync.samba.org/rsync-and-debian/
1010
abb0b532
S
1011
1012 -- --
1013
1014
25ff30e8
MP
1015rsyncable gzip patch
1016
1017 Exhaustive, tortuous testing
1018
1019 Cleanups?
1020
abb0b532
S
1021 -- --
1022
1023
25ff30e8
MP
1024rsyncsplit as alternative to real integration with gzip?
1025
abb0b532
S
1026 -- --
1027
1028
25ff30e8
MP
1029reverse rsync over HTTP Range
1030
1031 Goswin Brederlow suggested this on Debian; I think tridge and I
1032 talked about it previous in relation to rproxy.
a577af90 1033
abb0b532 1034 -- --
a577af90 1035