From 3c6cd53b238daeeb5ba2afa87616df7cc90a429b Mon Sep 17 00:00:00 2001 From: Martin Pool Date: Wed, 12 Sep 2001 14:20:44 +0000 Subject: [PATCH] Think think. --- rsync3.txt | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 146 insertions(+), 8 deletions(-) diff --git a/rsync3.txt b/rsync3.txt index 15bb7b06..21ebaf6d 100644 --- a/rsync3.txt +++ b/rsync3.txt @@ -1,7 +1,7 @@ -*- indented-text -*- Notes towards a new version of rsync -Martin Pool +Martin Pool , September 2001. Good things about the current implementation: @@ -36,6 +36,12 @@ Good things about the current implementation: - You can easily push or pull simply by switching the order of files. + - The "modules" system has some neat features compared to + e.g. Apache's per-directory configuration. In particular, because + you can set a userid and chroot directory, there is strong + protection between different modules. I haven't seen any calls + for a more flexible system. + Bad things about the current implementation: @@ -64,6 +70,13 @@ Bad things about the current implementation: - Error messages can be cryptic. + - Default behaviour is not intuitive: in too many cases rsync will + happily do nothing. Perhaps -a should be the default? + + - People get confused by trailing slashes, though it's hard to think + of another reasonable way to make this necessary distinction + between a directory and its contents. + Protocol philosophy: @@ -115,10 +128,48 @@ Desirable features: Unix. It might be better to try to add O_NOATIME to kernels, and call that. - - VFS. Useful? - - Unicode. Probably just use UTF-8 for everything. + - Open authentication system. Can we use PAM? Is SASL an adequate + mapping of PAM to the network, or useful in some other way? + + - Resume interrupted transfers without the --partial flag. We need + to leave the temporary file behind, and then know to use it. This + leaves a risk of large temporary files accumulating, which is not + good. Perhaps it should be off by default. + + - tcpwrappers support. Should be trivial; can already be done + through tcpd or inetd. + + - Socks support built in. It's not clear this is any better than + just linking against the socks library, though. + + - When run over SSH, invoke with predictable command-line arguments, + so that people can restrict what commands sshd will run. (Is this + really required?) + + - Comparison mode: give a list of which files are new, gone, or + different. Set return code depending on whether anything has + changed. + + - Internationalized messages (gettext?) + + - Optionally use real regexps rather than globs? + + - Show overall progress. Pretty hard to do, especially if we insist + on not scanning the directory tree up front. + + +Regression testing: + + - Support automatic testing. + + - Have hard internal timeouts against hangs. + + - Be deterministic. + + - Measure performance. + Hard links: @@ -131,6 +182,14 @@ Hard links: become known. +Command-line options: + + We have rather a lot at the moment. We might get more if the tool + becomes more flexible. Do we need a .rc or configuration file? + That wouldn't really fit with its pattern of use: cp and tar don't + have them, though ssh does. + + Scripting issues: - Perhaps support multiple scripting languages: candidates include @@ -144,6 +203,19 @@ Scripting issues: it's not running in the users own account. So we can either disallow it, or use some kind of sandbox system. + - Python is a good language, but the syntax is not so good for + giving small fragments on the command line. + + - Tcl is broken Lisp. + + - Lots of sysadmins know Perl, though Perl can give some bizarre or + confusing errors. The built in stat operators and regexps might + be useful. + + - Sadly probably not enough people know Scheme. + + - sh is hard to embed. + Scripting hooks: @@ -159,6 +231,26 @@ Scripting hooks: - Locking + - Cache + + - Generating backup path/name. + + - Post-processing of backups, e.g. to do compression. + + - After transfer, before replacement: so that we can spit out a diff + of what was changed, or kick off some kind of reconciliation + process. + + +VFS: + + Rather than talking straight to the filesystem, rsyncd talks through + an internal API. Samba has one. Is it useful? + + - Could be a tidy way to implement cached signatures. + + - Keep files compressed on disk? + Interactive interface: @@ -169,10 +261,14 @@ Interactive interface: - The standalone process needs to produce output in a form easily digestible by a calling program, like the --emacs feature some - have. + have. Same goes for output: rpm outputs a series of hash symbols, + which are easier for a GUI to handle than "\r30% complete" + strings. - Yow! emacs support. (You could probably build that already, of - course.) + course.) I'd like to be able to write a simple script on a remote + machine that rsyncs it to my workstation, edits it there, then + pushes it back up. Pie-in-the-sky features: @@ -203,6 +299,25 @@ Pie-in-the-sky features: with replication in place, though on some systems we will also have to do I/O on block boundaries. + - Peer to peer features. Flavour of the year. Can we think about + ways for clients to smoothly and voluntarily become servers for + content they receive? + + +Unlikely features: + + - Allow remote source and destination. If this can be cleanly + designed into the protocol, perhaps with the remote machine acting + as a kind of echo, then it's good. It's uncommon enough that we + don't want to shape the whole protocol around it, though. + + In fact, in a triangle of machines there are two possibilities: + all traffic passes from remote1 to remote2 through local, or local + just sets up the transfer and then remote1 talks to remote2. FTP + supports the second but it's not clearly good. There are some + security problems with being able to instruct one machine to open + a connection to another. + In favour of evolving the protocol: @@ -274,7 +389,7 @@ Conflict resolution: would be useful. -Moved files: +Moved files: - There's no trivial way to detect renamed files, especially if they move between directories. @@ -290,6 +405,12 @@ Moved files: Filesystem migration: + NFSv4 probably wants to migrate file locks, but that's not really + our problem. + + +Atomic updates: + The NFSv4 working group wants atomic migration. Most of the responsibility for this lies on the NFS server or OS. @@ -297,8 +418,9 @@ Filesystem migration: at the end. This ties in to having separate basis and destination files. - NFSv4 probably wants to migrate file locks, but that's not really - our problem. + There's no way in Unix to replace a whole set of files atomically. + However, if we get them all onto the destination machine and then do + the updates quickly it would greatly reduce the window. Scalability: @@ -314,6 +436,8 @@ Scalability: On the whole CPU usage is not normally a limiting factor, if only because running over SSH burns a lot of cycles on encryption. + Perhaps have resource throttling without relying on rlimit. + Streaming: @@ -322,3 +446,17 @@ Streaming: pipelined. This is a problem with FTP, and NFS (at least up to v3). NFSv4 can pipeline operations, but building on that is probably a bit complicated. + + +Related work: + + - mirror.pl http://freshmeat.net/project/mirror/ + + - ProFTPd + + - Apache + + - http://freshmeat.net/search/?site=Freshmeat&q=mirror§ion=projects + + - BitTorrent -- p2p mirroring + http://bitconjurer.org/BitTorrent/ \ No newline at end of file -- 2.34.1