| 1 | -*- indented-text -*- |
| 2 | |
| 3 | Notes towards a new version of rsync |
| 4 | Martin Pool <mbp@samba.org>, September 2001. |
| 5 | |
| 6 | |
| 7 | Good things about the current implementation: |
| 8 | |
| 9 | - Widely known and adopted. |
| 10 | |
| 11 | - Fast/efficient, especially for moderately small sets of files over |
| 12 | slow links (transoceanic or modem.) |
| 13 | |
| 14 | - Fairly reliable. |
| 15 | |
| 16 | - The choice of running over a plain TCP socket or tunneling over |
| 17 | ssh. |
| 18 | |
| 19 | - rsync operations are idempotent: you can always run the same |
| 20 | command twice to make sure it worked properly without any fear. |
| 21 | (Are there any exceptions?) |
| 22 | |
| 23 | - Small changes to files cause small deltas. |
| 24 | |
| 25 | - There is a way to evolve the protocol to some extent. |
| 26 | |
| 27 | - rdiff and rsync --write-batch allow generation of standalone patch |
| 28 | sets. rsync+ is pretty cheesy, though. xdelta seems cleaner. |
| 29 | |
| 30 | - Process triangle is creative, but seems to provoke OS bugs. |
| 31 | |
| 32 | - "Morning-after property": you don't need to know anything on the |
| 33 | local machine about the state of the remote machine, or about |
| 34 | transfers that have been done in the past. |
| 35 | |
| 36 | - You can easily push or pull simply by switching the order of |
| 37 | files. |
| 38 | |
| 39 | - The "modules" system has some neat features compared to |
| 40 | e.g. Apache's per-directory configuration. In particular, because |
| 41 | you can set a userid and chroot directory, there is strong |
| 42 | protection between different modules. I haven't seen any calls |
| 43 | for a more flexible system. |
| 44 | |
| 45 | |
| 46 | Bad things about the current implementation: |
| 47 | |
| 48 | - Persistent and hard-to-diagnose hang bugs remain |
| 49 | |
| 50 | - Protocol is sketchily documented, tied to this implementation, and |
| 51 | hard to modify/extend |
| 52 | |
| 53 | - Both the program and the protocol assume a single non-interactive |
| 54 | one-way transfer |
| 55 | |
| 56 | - A list of all files are held in memory for the entire transfer, |
| 57 | which cripples scalability to large file trees |
| 58 | |
| 59 | - Opening a new socket for every operation causes problems, |
| 60 | especially when running over SSH with password authentication. |
| 61 | |
| 62 | - Renamed files are not handled: the old file is removed, and the |
| 63 | new file created from scratch. |
| 64 | |
| 65 | - The versioning approach assumes that future versions of the |
| 66 | program know about all previous versions, and will do the right |
| 67 | thing. |
| 68 | |
| 69 | - People always get confused about ':' vs '::' |
| 70 | |
| 71 | - Error messages can be cryptic. |
| 72 | |
| 73 | - Default behaviour is not intuitive: in too many cases rsync will |
| 74 | happily do nothing. Perhaps -a should be the default? |
| 75 | |
| 76 | - People get confused by trailing slashes, though it's hard to think |
| 77 | of another reasonable way to make this necessary distinction |
| 78 | between a directory and its contents. |
| 79 | |
| 80 | |
| 81 | Protocol philosophy: |
| 82 | |
| 83 | *The* big difference between protocols like HTTP, FTP, and NFS is |
| 84 | that their fundamental operations are "read this file", "delete |
| 85 | this file", and "make this directory", whereas rsync is "make this |
| 86 | directory like this one". |
| 87 | |
| 88 | |
| 89 | Questionable features: |
| 90 | |
| 91 | These are neat, but not necessarily clean or worth preserving. |
| 92 | |
| 93 | - The remote rsync can be wrapped by some other program, such as in |
| 94 | tridge's rsync-mail scripts. The general feature of sending and |
| 95 | retrieving mail over rsync is good, but this is perhaps not the |
| 96 | right way to implement it. |
| 97 | |
| 98 | |
| 99 | Desirable features: |
| 100 | |
| 101 | These don't really require architectural changes; they're just |
| 102 | something to keep in mind. |
| 103 | |
| 104 | - Synchronize ACLs and extended attributes |
| 105 | |
| 106 | - Anonymous servers should be efficient |
| 107 | |
| 108 | - Code should be portable to non-UNIX systems |
| 109 | |
| 110 | - Should be possible to document the protocol in RFC form |
| 111 | |
| 112 | - --dry-run option |
| 113 | |
| 114 | - IPv6 support. Pretty straightforward. |
| 115 | |
| 116 | - Allow the basis and destination files to be different. For |
| 117 | example, you could use this when you have a CD-ROM and want to |
| 118 | download an updated image onto a hard drive. |
| 119 | |
| 120 | - Efficiently interrupt and restart a transfer. We can write a |
| 121 | checkpoint file that says where we're up to in the filesystem. |
| 122 | Alternatively, as long as transfers are idempotent, we can just |
| 123 | restart the whole thing. [NFSv4] |
| 124 | |
| 125 | - Scripting support. |
| 126 | |
| 127 | - Propagate atimes and do not modify them. This is very ugly on |
| 128 | Unix. It might be better to try to add O_NOATIME to kernels, and |
| 129 | call that. |
| 130 | |
| 131 | - Unicode. Probably just use UTF-8 for everything. |
| 132 | |
| 133 | - Open authentication system. Can we use PAM? Is SASL an adequate |
| 134 | mapping of PAM to the network, or useful in some other way? |
| 135 | |
| 136 | - Resume interrupted transfers without the --partial flag. We need |
| 137 | to leave the temporary file behind, and then know to use it. This |
| 138 | leaves a risk of large temporary files accumulating, which is not |
| 139 | good. Perhaps it should be off by default. |
| 140 | |
| 141 | - tcpwrappers support. Should be trivial; can already be done |
| 142 | through tcpd or inetd. |
| 143 | |
| 144 | - Socks support built in. It's not clear this is any better than |
| 145 | just linking against the socks library, though. |
| 146 | |
| 147 | - When run over SSH, invoke with predictable command-line arguments, |
| 148 | so that people can restrict what commands sshd will run. (Is this |
| 149 | really required?) |
| 150 | |
| 151 | - Comparison mode: give a list of which files are new, gone, or |
| 152 | different. Set return code depending on whether anything has |
| 153 | changed. |
| 154 | |
| 155 | - Internationalized messages (gettext?) |
| 156 | |
| 157 | - Optionally use real regexps rather than globs? |
| 158 | |
| 159 | - Show overall progress. Pretty hard to do, especially if we insist |
| 160 | on not scanning the directory tree up front. |
| 161 | |
| 162 | |
| 163 | Regression testing: |
| 164 | |
| 165 | - Support automatic testing. |
| 166 | |
| 167 | - Have hard internal timeouts against hangs. |
| 168 | |
| 169 | - Be deterministic. |
| 170 | |
| 171 | - Measure performance. |
| 172 | |
| 173 | |
| 174 | Hard links: |
| 175 | |
| 176 | At the moment, we can recreate hard links, but it's a bit |
| 177 | inefficient: it depends on holding a list of all files in the tree. |
| 178 | Every time we see a file with a linkcount >1, we need to search for |
| 179 | another known name that has the same (fsid,inum) tuple. We could do |
| 180 | that more efficiently by keeping a list of only files with |
| 181 | linkcount>1, and removing files from that list as all their names |
| 182 | become known. |
| 183 | |
| 184 | |
| 185 | Command-line options: |
| 186 | |
| 187 | We have rather a lot at the moment. We might get more if the tool |
| 188 | becomes more flexible. Do we need a .rc or configuration file? |
| 189 | That wouldn't really fit with its pattern of use: cp and tar don't |
| 190 | have them, though ssh does. |
| 191 | |
| 192 | |
| 193 | Scripting issues: |
| 194 | |
| 195 | - Perhaps support multiple scripting languages: candidates include |
| 196 | Perl, Python, Tcl, Scheme (guile?), sh, ... |
| 197 | |
| 198 | - Simply running a subprocess and looking at its stdout/exit code |
| 199 | might be sufficient, though it could also be pretty slow if it's |
| 200 | called often. |
| 201 | |
| 202 | - There are security issues about running remote code, at least if |
| 203 | it's not running in the users own account. So we can either |
| 204 | disallow it, or use some kind of sandbox system. |
| 205 | |
| 206 | - Python is a good language, but the syntax is not so good for |
| 207 | giving small fragments on the command line. |
| 208 | |
| 209 | - Tcl is broken Lisp. |
| 210 | |
| 211 | - Lots of sysadmins know Perl, though Perl can give some bizarre or |
| 212 | confusing errors. The built in stat operators and regexps might |
| 213 | be useful. |
| 214 | |
| 215 | - Sadly probably not enough people know Scheme. |
| 216 | |
| 217 | - sh is hard to embed. |
| 218 | |
| 219 | |
| 220 | Scripting hooks: |
| 221 | |
| 222 | - Whether to transfer a file |
| 223 | |
| 224 | - What basis file to use |
| 225 | |
| 226 | - Logging |
| 227 | |
| 228 | - Whether to allow transfers (for public servers) |
| 229 | |
| 230 | - Authentication |
| 231 | |
| 232 | - Locking |
| 233 | |
| 234 | - Cache |
| 235 | |
| 236 | - Generating backup path/name. |
| 237 | |
| 238 | - Post-processing of backups, e.g. to do compression. |
| 239 | |
| 240 | - After transfer, before replacement: so that we can spit out a diff |
| 241 | of what was changed, or kick off some kind of reconciliation |
| 242 | process. |
| 243 | |
| 244 | |
| 245 | VFS: |
| 246 | |
| 247 | Rather than talking straight to the filesystem, rsyncd talks through |
| 248 | an internal API. Samba has one. Is it useful? |
| 249 | |
| 250 | - Could be a tidy way to implement cached signatures. |
| 251 | |
| 252 | - Keep files compressed on disk? |
| 253 | |
| 254 | |
| 255 | Interactive interface: |
| 256 | |
| 257 | - Something like ncFTP, or integration into GNOME-vfs. Probably |
| 258 | hold a single socket connection open. |
| 259 | |
| 260 | - Can either call us as a separate process, or as a library. |
| 261 | |
| 262 | - The standalone process needs to produce output in a form easily |
| 263 | digestible by a calling program, like the --emacs feature some |
| 264 | have. Same goes for output: rpm outputs a series of hash symbols, |
| 265 | which are easier for a GUI to handle than "\r30% complete" |
| 266 | strings. |
| 267 | |
| 268 | - Yow! emacs support. (You could probably build that already, of |
| 269 | course.) I'd like to be able to write a simple script on a remote |
| 270 | machine that rsyncs it to my workstation, edits it there, then |
| 271 | pushes it back up. |
| 272 | |
| 273 | |
| 274 | Pie-in-the-sky features: |
| 275 | |
| 276 | These might have a severe impact on the protocol, and are not |
| 277 | clearly in our core requirements. It looks like in many of them |
| 278 | having scripting hooks will allow us |
| 279 | |
| 280 | - Transport over UDP multicast. The hard part is handling multiple |
| 281 | destinations which have different basis files. We can look at |
| 282 | multicast-TFTP for inspiration. |
| 283 | |
| 284 | - Conflict resolution. Possibly general scripting support will be |
| 285 | sufficient. |
| 286 | |
| 287 | - Integrate with locking. It's hard to see a good general solution, |
| 288 | because Unix systems have several locking mechanisms, and grabbing |
| 289 | the lock from programs that don't expect it could cause deadlocks, |
| 290 | timeouts, or other problems. Scripting support might help. |
| 291 | |
| 292 | - Replicate in place, rather than to a temporary file. This is |
| 293 | dangerous in the case of interruption, and it also means that the |
| 294 | delta can't refer to blocks that have already been overwritten. |
| 295 | On the other hand we could semi-trivially do this at first by |
| 296 | simply generating a delta with no copy instructions. |
| 297 | |
| 298 | - Replicate block devices. Most of the difficulties here are to do |
| 299 | with replication in place, though on some systems we will also |
| 300 | have to do I/O on block boundaries. |
| 301 | |
| 302 | - Peer to peer features. Flavour of the year. Can we think about |
| 303 | ways for clients to smoothly and voluntarily become servers for |
| 304 | content they receive? |
| 305 | |
| 306 | - Imagine a situation where the destination has a much faster link |
| 307 | to the cloud than the source. In this case, Mojo Nation downloads |
| 308 | interleaved blocks from several slower servers. The general |
| 309 | situation might be a way for a master rsync process to farm out |
| 310 | tasks to several subjobs. In this particular case they'd need |
| 311 | different sockets. This might be related to multicast. |
| 312 | |
| 313 | |
| 314 | Unlikely features: |
| 315 | |
| 316 | - Allow remote source and destination. If this can be cleanly |
| 317 | designed into the protocol, perhaps with the remote machine acting |
| 318 | as a kind of echo, then it's good. It's uncommon enough that we |
| 319 | don't want to shape the whole protocol around it, though. |
| 320 | |
| 321 | In fact, in a triangle of machines there are two possibilities: |
| 322 | all traffic passes from remote1 to remote2 through local, or local |
| 323 | just sets up the transfer and then remote1 talks to remote2. FTP |
| 324 | supports the second but it's not clearly good. There are some |
| 325 | security problems with being able to instruct one machine to open |
| 326 | a connection to another. |
| 327 | |
| 328 | |
| 329 | In favour of evolving the protocol: |
| 330 | |
| 331 | - Keeping compatibility with existing rsync servers will help with |
| 332 | adoption and testing. |
| 333 | |
| 334 | - We should at the very least be able to fall back to the new |
| 335 | protocol. |
| 336 | |
| 337 | - Error handling is not so good. |
| 338 | |
| 339 | |
| 340 | In favour of using a new protocol: |
| 341 | |
| 342 | - Maintaining compatibility might soak up development time that |
| 343 | would better go into improving a new protocol. |
| 344 | |
| 345 | - If we start from scratch, it can be documented as we go, and we |
| 346 | can avoid design decisions that make the protocol complex or |
| 347 | implementation-bound. |
| 348 | |
| 349 | |
| 350 | Error handling: |
| 351 | |
| 352 | - Errors should come back reliably, and be clearly associated with |
| 353 | the particular file that caused the problem. |
| 354 | |
| 355 | - Some errors ought to cause the whole transfer to abort; some are |
| 356 | just warnings. If any errors have occurred, then rsync ought to |
| 357 | return an error. |
| 358 | |
| 359 | |
| 360 | Concurrency: |
| 361 | |
| 362 | - We want to keep the CPU, filesystem, and network as full as |
| 363 | possible as much of the time as possible. |
| 364 | |
| 365 | - We can do nonblocking network IO, but not so for disk. |
| 366 | |
| 367 | - It makes sense to on the destination be generating signatures and |
| 368 | applying patches at the same time. |
| 369 | |
| 370 | - Can structure this with nonblocking, threads, separate processes, |
| 371 | etc. |
| 372 | |
| 373 | |
| 374 | Uses: |
| 375 | |
| 376 | - Mirroring software distributions: |
| 377 | |
| 378 | - Synchronizing laptop and desktop |
| 379 | |
| 380 | - NFS filesystem migration/replication. See |
| 381 | http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764 |
| 382 | |
| 383 | - Sync with PDA |
| 384 | |
| 385 | - Network backup systems |
| 386 | |
| 387 | - CVS filemover |
| 388 | |
| 389 | |
| 390 | Conflict resolution: |
| 391 | |
| 392 | - Requires application-specific knowledge. We want to provide |
| 393 | policy, rather than mechanism. |
| 394 | |
| 395 | - Possibly allowing two-way migration across a single connection |
| 396 | would be useful. |
| 397 | |
| 398 | |
| 399 | Moved files: <http://rsync.samba.org/cgi-bin/rsync.fom?file=44> |
| 400 | |
| 401 | - There's no trivial way to detect renamed files, especially if they |
| 402 | move between directories. |
| 403 | |
| 404 | - If we had a picture of the remote directory from last time on |
| 405 | either machine, then the inode numbers might give us a hint about |
| 406 | files which may have been renamed. |
| 407 | |
| 408 | - Files that are renamed and not modified can be detected by |
| 409 | examining the directory listing, looking for files with the same |
| 410 | size/date as the origin. |
| 411 | |
| 412 | |
| 413 | Filesystem migration: |
| 414 | |
| 415 | NFSv4 probably wants to migrate file locks, but that's not really |
| 416 | our problem. |
| 417 | |
| 418 | |
| 419 | Atomic updates: |
| 420 | |
| 421 | The NFSv4 working group wants atomic migration. Most of the |
| 422 | responsibility for this lies on the NFS server or OS. |
| 423 | |
| 424 | If migrating a whole tree, then we could do a nearly-atomic rename |
| 425 | at the end. This ties in to having separate basis and destination |
| 426 | files. |
| 427 | |
| 428 | There's no way in Unix to replace a whole set of files atomically. |
| 429 | However, if we get them all onto the destination machine and then do |
| 430 | the updates quickly it would greatly reduce the window. |
| 431 | |
| 432 | |
| 433 | Scalability: |
| 434 | |
| 435 | We should aim to work well on machines in use in a year or two. |
| 436 | That probably means transfers of many millions of files in one |
| 437 | batch, and gigabytes or terabytes of data. |
| 438 | |
| 439 | For argument's sake: at the low end, we want to sync ten files for a |
| 440 | total of 10kb across a 1kB/s link. At the high end, we want to sync |
| 441 | 1e9 files for 1TB of data across a 1GB/s link. |
| 442 | |
| 443 | On the whole CPU usage is not normally a limiting factor, if only |
| 444 | because running over SSH burns a lot of cycles on encryption. |
| 445 | |
| 446 | Perhaps have resource throttling without relying on rlimit. |
| 447 | |
| 448 | |
| 449 | Streaming: |
| 450 | |
| 451 | A big attraction of rsync is that there are few round-trip delays: |
| 452 | basically only one to get started, and then everything is |
| 453 | pipelined. This is a problem with FTP, and NFS (at least up to |
| 454 | v3). NFSv4 can pipeline operations, but building on that is |
| 455 | probably a bit complicated. |
| 456 | |
| 457 | |
| 458 | Related work: |
| 459 | |
| 460 | - mirror.pl http://freshmeat.net/project/mirror/ |
| 461 | |
| 462 | - ProFTPd |
| 463 | |
| 464 | - Apache |
| 465 | |
| 466 | - http://freshmeat.net/search/?site=Freshmeat&q=mirror§ion=projects |
| 467 | |
| 468 | - BitTorrent -- p2p mirroring |
| 469 | http://bitconjurer.org/BitTorrent/ |