X-Git-Url: https://mattmccutchen.net/rsync/rsync.git/blobdiff_plain/4f69fe59c7df335de04bd4409a369885eb31ab2a..bd685982389b78a158921b7839bdeca501338d19:/rsync3.txt diff --git a/rsync3.txt b/rsync3.txt index 15bb7b06..967aa4b5 100644 --- a/rsync3.txt +++ b/rsync3.txt @@ -1,7 +1,7 @@ -*- indented-text -*- -Notes towards a new version of rsync -Martin Pool +Notes towards a new version of rsync +Martin Pool , September 2001. Good things about the current implementation: @@ -13,7 +13,7 @@ Good things about the current implementation: - Fairly reliable. - - The choice of runnning over a plain TCP socket or tunneling over + - The choice of running over a plain TCP socket or tunneling over ssh. - rsync operations are idempotent: you can always run the same @@ -36,6 +36,12 @@ Good things about the current implementation: - You can easily push or pull simply by switching the order of files. + - The "modules" system has some neat features compared to + e.g. Apache's per-directory configuration. In particular, because + you can set a userid and chroot directory, there is strong + protection between different modules. I haven't seen any calls + for a more flexible system. + Bad things about the current implementation: @@ -45,7 +51,7 @@ Bad things about the current implementation: hard to modify/extend - Both the program and the protocol assume a single non-interactive - one-way transfer + one-way transfer - A list of all files are held in memory for the entire transfer, which cripples scalability to large file trees @@ -64,6 +70,13 @@ Bad things about the current implementation: - Error messages can be cryptic. + - Default behaviour is not intuitive: in too many cases rsync will + happily do nothing. Perhaps -a should be the default? + + - People get confused by trailing slashes, though it's hard to think + of another reasonable way to make this necessary distinction + between a directory and its contents. + Protocol philosophy: @@ -75,7 +88,7 @@ Protocol philosophy: Questionable features: - These are neat, but not necessarily clean or worth preserving. + These are neat, but not necessarily clean or worth preserving. - The remote rsync can be wrapped by some other program, such as in tridge's rsync-mail scripts. The general feature of sending and @@ -87,7 +100,7 @@ Desirable features: These don't really require architectural changes; they're just something to keep in mind. - + - Synchronize ACLs and extended attributes - Anonymous servers should be efficient @@ -109,16 +122,54 @@ Desirable features: Alternatively, as long as transfers are idempotent, we can just restart the whole thing. [NFSv4] - - Scripting support. + - Scripting support. - Propagate atimes and do not modify them. This is very ugly on Unix. It might be better to try to add O_NOATIME to kernels, and call that. - - VFS. Useful? - - Unicode. Probably just use UTF-8 for everything. + - Open authentication system. Can we use PAM? Is SASL an adequate + mapping of PAM to the network, or useful in some other way? + + - Resume interrupted transfers without the --partial flag. We need + to leave the temporary file behind, and then know to use it. This + leaves a risk of large temporary files accumulating, which is not + good. Perhaps it should be off by default. + + - tcpwrappers support. Should be trivial; can already be done + through tcpd or inetd. + + - Socks support built in. It's not clear this is any better than + just linking against the socks library, though. + + - When run over SSH, invoke with predictable command-line arguments, + so that people can restrict what commands sshd will run. (Is this + really required?) + + - Comparison mode: give a list of which files are new, gone, or + different. Set return code depending on whether anything has + changed. + + - Internationalized messages (gettext?) + + - Optionally use real regexps rather than globs? + + - Show overall progress. Pretty hard to do, especially if we insist + on not scanning the directory tree up front. + + +Regression testing: + + - Support automatic testing. + + - Have hard internal timeouts against hangs. + + - Be deterministic. + + - Measure performance. + Hard links: @@ -131,6 +182,14 @@ Hard links: become known. +Command-line options: + + We have rather a lot at the moment. We might get more if the tool + becomes more flexible. Do we need a .rc or configuration file? + That wouldn't really fit with its pattern of use: cp and tar don't + have them, though ssh does. + + Scripting issues: - Perhaps support multiple scripting languages: candidates include @@ -144,6 +203,19 @@ Scripting issues: it's not running in the users own account. So we can either disallow it, or use some kind of sandbox system. + - Python is a good language, but the syntax is not so good for + giving small fragments on the command line. + + - Tcl is broken Lisp. + + - Lots of sysadmins know Perl, though Perl can give some bizarre or + confusing errors. The built in stat operators and regexps might + be useful. + + - Sadly probably not enough people know Scheme. + + - sh is hard to embed. + Scripting hooks: @@ -152,13 +224,33 @@ Scripting hooks: - What basis file to use - Logging - + - Whether to allow transfers (for public servers) - Authentication - Locking + - Cache + + - Generating backup path/name. + + - Post-processing of backups, e.g. to do compression. + + - After transfer, before replacement: so that we can spit out a diff + of what was changed, or kick off some kind of reconciliation + process. + + +VFS: + + Rather than talking straight to the filesystem, rsyncd talks through + an internal API. Samba has one. Is it useful? + + - Could be a tidy way to implement cached signatures. + + - Keep files compressed on disk? + Interactive interface: @@ -169,17 +261,21 @@ Interactive interface: - The standalone process needs to produce output in a form easily digestible by a calling program, like the --emacs feature some - have. + have. Same goes for output: rpm outputs a series of hash symbols, + which are easier for a GUI to handle than "\r30% complete" + strings. - Yow! emacs support. (You could probably build that already, of - course.) + course.) I'd like to be able to write a simple script on a remote + machine that rsyncs it to my workstation, edits it there, then + pushes it back up. Pie-in-the-sky features: These might have a severe impact on the protocol, and are not clearly in our core requirements. It looks like in many of them - having scripting hooks will allow us + having scripting hooks will allow us - Transport over UDP multicast. The hard part is handling multiple destinations which have different basis files. We can look at @@ -203,6 +299,32 @@ Pie-in-the-sky features: with replication in place, though on some systems we will also have to do I/O on block boundaries. + - Peer to peer features. Flavour of the year. Can we think about + ways for clients to smoothly and voluntarily become servers for + content they receive? + + - Imagine a situation where the destination has a much faster link + to the cloud than the source. In this case, Mojo Nation downloads + interleaved blocks from several slower servers. The general + situation might be a way for a master rsync process to farm out + tasks to several subjobs. In this particular case they'd need + different sockets. This might be related to multicast. + + +Unlikely features: + + - Allow remote source and destination. If this can be cleanly + designed into the protocol, perhaps with the remote machine acting + as a kind of echo, then it's good. It's uncommon enough that we + don't want to shape the whole protocol around it, though. + + In fact, in a triangle of machines there are two possibilities: + all traffic passes from remote1 to remote2 through local, or local + just sets up the transfer and then remote1 talks to remote2. FTP + supports the second but it's not clearly good. There are some + security problems with being able to instruct one machine to open + a connection to another. + In favour of evolving the protocol: @@ -222,7 +344,7 @@ In favour of using a new protocol: - If we start from scratch, it can be documented as we go, and we can avoid design decisions that make the protocol complex or - implementation-bound. + implementation-bound. Error handling: @@ -243,7 +365,7 @@ Concurrency: - We can do nonblocking network IO, but not so for disk. - It makes sense to on the destination be generating signatures and - applying patches at the same time. + applying patches at the same time. - Can structure this with nonblocking, threads, separate processes, etc. @@ -259,7 +381,7 @@ Uses: http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764 - Sync with PDA - + - Network backup systems - CVS filemover @@ -274,7 +396,7 @@ Conflict resolution: would be useful. -Moved files: +Moved files: - There's no trivial way to detect renamed files, especially if they move between directories. @@ -290,19 +412,26 @@ Moved files: Filesystem migration: + NFSv4 probably wants to migrate file locks, but that's not really + our problem. + + +Atomic updates: + The NFSv4 working group wants atomic migration. Most of the - responsibility for this lies on the NFS server or OS. + responsibility for this lies on the NFS server or OS. If migrating a whole tree, then we could do a nearly-atomic rename at the end. This ties in to having separate basis and destination files. - NFSv4 probably wants to migrate file locks, but that's not really - our problem. + There's no way in Unix to replace a whole set of files atomically. + However, if we get them all onto the destination machine and then do + the updates quickly it would greatly reduce the window. Scalability: - + We should aim to work well on machines in use in a year or two. That probably means transfers of many millions of files in one batch, and gigabytes or terabytes of data. @@ -314,6 +443,8 @@ Scalability: On the whole CPU usage is not normally a limiting factor, if only because running over SSH burns a lot of cycles on encryption. + Perhaps have resource throttling without relying on rlimit. + Streaming: @@ -322,3 +453,17 @@ Streaming: pipelined. This is a problem with FTP, and NFS (at least up to v3). NFSv4 can pipeline operations, but building on that is probably a bit complicated. + + +Related work: + + - mirror.pl http://freshmeat.net/project/mirror/ + + - ProFTPd + + - Apache + + - http://freshmeat.net/search/?site=Freshmeat&q=mirror§ion=projects + + - BitTorrent -- p2p mirroring + http://bitconjurer.org/BitTorrent/