-*- indented-text -*-
-Notes towards a new version of rsync
+Notes towards a new version of rsync
Martin Pool <mbp@samba.org>, September 2001.
- Fairly reliable.
- - The choice of runnning over a plain TCP socket or tunneling over
+ - The choice of running over a plain TCP socket or tunneling over
ssh.
- rsync operations are idempotent: you can always run the same
hard to modify/extend
- Both the program and the protocol assume a single non-interactive
- one-way transfer
+ one-way transfer
- A list of all files are held in memory for the entire transfer,
which cripples scalability to large file trees
Questionable features:
- These are neat, but not necessarily clean or worth preserving.
+ These are neat, but not necessarily clean or worth preserving.
- The remote rsync can be wrapped by some other program, such as in
tridge's rsync-mail scripts. The general feature of sending and
These don't really require architectural changes; they're just
something to keep in mind.
-
+
- Synchronize ACLs and extended attributes
- Anonymous servers should be efficient
Alternatively, as long as transfers are idempotent, we can just
restart the whole thing. [NFSv4]
- - Scripting support.
+ - Scripting support.
- Propagate atimes and do not modify them. This is very ugly on
Unix. It might be better to try to add O_NOATIME to kernels, and
- What basis file to use
- Logging
-
+
- Whether to allow transfers (for public servers)
- Authentication
These might have a severe impact on the protocol, and are not
clearly in our core requirements. It looks like in many of them
- having scripting hooks will allow us
+ having scripting hooks will allow us
- Transport over UDP multicast. The hard part is handling multiple
destinations which have different basis files. We can look at
ways for clients to smoothly and voluntarily become servers for
content they receive?
+ - Imagine a situation where the destination has a much faster link
+ to the cloud than the source. In this case, Mojo Nation downloads
+ interleaved blocks from several slower servers. The general
+ situation might be a way for a master rsync process to farm out
+ tasks to several subjobs. In this particular case they'd need
+ different sockets. This might be related to multicast.
+
Unlikely features:
- If we start from scratch, it can be documented as we go, and we
can avoid design decisions that make the protocol complex or
- implementation-bound.
+ implementation-bound.
Error handling:
- We can do nonblocking network IO, but not so for disk.
- It makes sense to on the destination be generating signatures and
- applying patches at the same time.
+ applying patches at the same time.
- Can structure this with nonblocking, threads, separate processes,
etc.
http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764
- Sync with PDA
-
+
- Network backup systems
- CVS filemover
Atomic updates:
The NFSv4 working group wants atomic migration. Most of the
- responsibility for this lies on the NFS server or OS.
+ responsibility for this lies on the NFS server or OS.
If migrating a whole tree, then we could do a nearly-atomic rename
at the end. This ties in to having separate basis and destination
There's no way in Unix to replace a whole set of files atomically.
However, if we get them all onto the destination machine and then do
- the updates quickly it would greatly reduce the window.
+ the updates quickly it would greatly reduce the window.
Scalability:
-
+
We should aim to work well on machines in use in a year or two.
That probably means transfers of many millions of files in one
batch, and gigabytes or terabytes of data.
- http://freshmeat.net/search/?site=Freshmeat&q=mirror§ion=projects
- BitTorrent -- p2p mirroring
- http://bitconjurer.org/BitTorrent/
\ No newline at end of file
+ http://bitconjurer.org/BitTorrent/