Commit | Line | Data |
---|---|---|
4f69fe59 MP |
1 | -*- indented-text -*- |
2 | ||
3 | Notes towards a new version of rsync | |
3c6cd53b | 4 | Martin Pool <mbp@samba.org>, September 2001. |
4f69fe59 MP |
5 | |
6 | ||
7 | Good things about the current implementation: | |
8 | ||
9 | - Widely known and adopted. | |
10 | ||
11 | - Fast/efficient, especially for moderately small sets of files over | |
12 | slow links (transoceanic or modem.) | |
13 | ||
14 | - Fairly reliable. | |
15 | ||
16 | - The choice of runnning over a plain TCP socket or tunneling over | |
17 | ssh. | |
18 | ||
19 | - rsync operations are idempotent: you can always run the same | |
20 | command twice to make sure it worked properly without any fear. | |
21 | (Are there any exceptions?) | |
22 | ||
23 | - Small changes to files cause small deltas. | |
24 | ||
25 | - There is a way to evolve the protocol to some extent. | |
26 | ||
27 | - rdiff and rsync --write-batch allow generation of standalone patch | |
28 | sets. rsync+ is pretty cheesy, though. xdelta seems cleaner. | |
29 | ||
30 | - Process triangle is creative, but seems to provoke OS bugs. | |
31 | ||
32 | - "Morning-after property": you don't need to know anything on the | |
33 | local machine about the state of the remote machine, or about | |
34 | transfers that have been done in the past. | |
35 | ||
36 | - You can easily push or pull simply by switching the order of | |
37 | files. | |
38 | ||
3c6cd53b MP |
39 | - The "modules" system has some neat features compared to |
40 | e.g. Apache's per-directory configuration. In particular, because | |
41 | you can set a userid and chroot directory, there is strong | |
42 | protection between different modules. I haven't seen any calls | |
43 | for a more flexible system. | |
44 | ||
4f69fe59 MP |
45 | |
46 | Bad things about the current implementation: | |
47 | ||
48 | - Persistent and hard-to-diagnose hang bugs remain | |
49 | ||
50 | - Protocol is sketchily documented, tied to this implementation, and | |
51 | hard to modify/extend | |
52 | ||
53 | - Both the program and the protocol assume a single non-interactive | |
54 | one-way transfer | |
55 | ||
56 | - A list of all files are held in memory for the entire transfer, | |
57 | which cripples scalability to large file trees | |
58 | ||
59 | - Opening a new socket for every operation causes problems, | |
60 | especially when running over SSH with password authentication. | |
61 | ||
62 | - Renamed files are not handled: the old file is removed, and the | |
63 | new file created from scratch. | |
64 | ||
65 | - The versioning approach assumes that future versions of the | |
66 | program know about all previous versions, and will do the right | |
67 | thing. | |
68 | ||
69 | - People always get confused about ':' vs '::' | |
70 | ||
71 | - Error messages can be cryptic. | |
72 | ||
3c6cd53b MP |
73 | - Default behaviour is not intuitive: in too many cases rsync will |
74 | happily do nothing. Perhaps -a should be the default? | |
75 | ||
76 | - People get confused by trailing slashes, though it's hard to think | |
77 | of another reasonable way to make this necessary distinction | |
78 | between a directory and its contents. | |
79 | ||
4f69fe59 MP |
80 | |
81 | Protocol philosophy: | |
82 | ||
83 | *The* big difference between protocols like HTTP, FTP, and NFS is | |
84 | that their fundamental operations are "read this file", "delete | |
85 | this file", and "make this directory", whereas rsync is "make this | |
86 | directory like this one". | |
87 | ||
88 | ||
89 | Questionable features: | |
90 | ||
91 | These are neat, but not necessarily clean or worth preserving. | |
92 | ||
93 | - The remote rsync can be wrapped by some other program, such as in | |
94 | tridge's rsync-mail scripts. The general feature of sending and | |
95 | retrieving mail over rsync is good, but this is perhaps not the | |
96 | right way to implement it. | |
97 | ||
98 | ||
99 | Desirable features: | |
100 | ||
101 | These don't really require architectural changes; they're just | |
102 | something to keep in mind. | |
103 | ||
104 | - Synchronize ACLs and extended attributes | |
105 | ||
106 | - Anonymous servers should be efficient | |
107 | ||
108 | - Code should be portable to non-UNIX systems | |
109 | ||
110 | - Should be possible to document the protocol in RFC form | |
111 | ||
112 | - --dry-run option | |
113 | ||
114 | - IPv6 support. Pretty straightforward. | |
115 | ||
116 | - Allow the basis and destination files to be different. For | |
117 | example, you could use this when you have a CD-ROM and want to | |
118 | download an updated image onto a hard drive. | |
119 | ||
120 | - Efficiently interrupt and restart a transfer. We can write a | |
121 | checkpoint file that says where we're up to in the filesystem. | |
122 | Alternatively, as long as transfers are idempotent, we can just | |
123 | restart the whole thing. [NFSv4] | |
124 | ||
125 | - Scripting support. | |
126 | ||
127 | - Propagate atimes and do not modify them. This is very ugly on | |
128 | Unix. It might be better to try to add O_NOATIME to kernels, and | |
129 | call that. | |
130 | ||
4f69fe59 MP |
131 | - Unicode. Probably just use UTF-8 for everything. |
132 | ||
3c6cd53b MP |
133 | - Open authentication system. Can we use PAM? Is SASL an adequate |
134 | mapping of PAM to the network, or useful in some other way? | |
135 | ||
136 | - Resume interrupted transfers without the --partial flag. We need | |
137 | to leave the temporary file behind, and then know to use it. This | |
138 | leaves a risk of large temporary files accumulating, which is not | |
139 | good. Perhaps it should be off by default. | |
140 | ||
141 | - tcpwrappers support. Should be trivial; can already be done | |
142 | through tcpd or inetd. | |
143 | ||
144 | - Socks support built in. It's not clear this is any better than | |
145 | just linking against the socks library, though. | |
146 | ||
147 | - When run over SSH, invoke with predictable command-line arguments, | |
148 | so that people can restrict what commands sshd will run. (Is this | |
149 | really required?) | |
150 | ||
151 | - Comparison mode: give a list of which files are new, gone, or | |
152 | different. Set return code depending on whether anything has | |
153 | changed. | |
154 | ||
155 | - Internationalized messages (gettext?) | |
156 | ||
157 | - Optionally use real regexps rather than globs? | |
158 | ||
159 | - Show overall progress. Pretty hard to do, especially if we insist | |
160 | on not scanning the directory tree up front. | |
161 | ||
162 | ||
163 | Regression testing: | |
164 | ||
165 | - Support automatic testing. | |
166 | ||
167 | - Have hard internal timeouts against hangs. | |
168 | ||
169 | - Be deterministic. | |
170 | ||
171 | - Measure performance. | |
172 | ||
4f69fe59 MP |
173 | |
174 | Hard links: | |
175 | ||
176 | At the moment, we can recreate hard links, but it's a bit | |
177 | inefficient: it depends on holding a list of all files in the tree. | |
178 | Every time we see a file with a linkcount >1, we need to search for | |
179 | another known name that has the same (fsid,inum) tuple. We could do | |
180 | that more efficiently by keeping a list of only files with | |
181 | linkcount>1, and removing files from that list as all their names | |
182 | become known. | |
183 | ||
184 | ||
3c6cd53b MP |
185 | Command-line options: |
186 | ||
187 | We have rather a lot at the moment. We might get more if the tool | |
188 | becomes more flexible. Do we need a .rc or configuration file? | |
189 | That wouldn't really fit with its pattern of use: cp and tar don't | |
190 | have them, though ssh does. | |
191 | ||
192 | ||
4f69fe59 MP |
193 | Scripting issues: |
194 | ||
195 | - Perhaps support multiple scripting languages: candidates include | |
196 | Perl, Python, Tcl, Scheme (guile?), sh, ... | |
197 | ||
198 | - Simply running a subprocess and looking at its stdout/exit code | |
199 | might be sufficient, though it could also be pretty slow if it's | |
200 | called often. | |
201 | ||
202 | - There are security issues about running remote code, at least if | |
203 | it's not running in the users own account. So we can either | |
204 | disallow it, or use some kind of sandbox system. | |
205 | ||
3c6cd53b MP |
206 | - Python is a good language, but the syntax is not so good for |
207 | giving small fragments on the command line. | |
208 | ||
209 | - Tcl is broken Lisp. | |
210 | ||
211 | - Lots of sysadmins know Perl, though Perl can give some bizarre or | |
212 | confusing errors. The built in stat operators and regexps might | |
213 | be useful. | |
214 | ||
215 | - Sadly probably not enough people know Scheme. | |
216 | ||
217 | - sh is hard to embed. | |
218 | ||
4f69fe59 MP |
219 | |
220 | Scripting hooks: | |
221 | ||
222 | - Whether to transfer a file | |
223 | ||
224 | - What basis file to use | |
225 | ||
226 | - Logging | |
227 | ||
228 | - Whether to allow transfers (for public servers) | |
229 | ||
230 | - Authentication | |
231 | ||
232 | - Locking | |
233 | ||
3c6cd53b MP |
234 | - Cache |
235 | ||
236 | - Generating backup path/name. | |
237 | ||
238 | - Post-processing of backups, e.g. to do compression. | |
239 | ||
240 | - After transfer, before replacement: so that we can spit out a diff | |
241 | of what was changed, or kick off some kind of reconciliation | |
242 | process. | |
243 | ||
244 | ||
245 | VFS: | |
246 | ||
247 | Rather than talking straight to the filesystem, rsyncd talks through | |
248 | an internal API. Samba has one. Is it useful? | |
249 | ||
250 | - Could be a tidy way to implement cached signatures. | |
251 | ||
252 | - Keep files compressed on disk? | |
253 | ||
4f69fe59 MP |
254 | |
255 | Interactive interface: | |
256 | ||
257 | - Something like ncFTP, or integration into GNOME-vfs. Probably | |
258 | hold a single socket connection open. | |
259 | ||
260 | - Can either call us as a separate process, or as a library. | |
261 | ||
262 | - The standalone process needs to produce output in a form easily | |
263 | digestible by a calling program, like the --emacs feature some | |
3c6cd53b MP |
264 | have. Same goes for output: rpm outputs a series of hash symbols, |
265 | which are easier for a GUI to handle than "\r30% complete" | |
266 | strings. | |
4f69fe59 MP |
267 | |
268 | - Yow! emacs support. (You could probably build that already, of | |
3c6cd53b MP |
269 | course.) I'd like to be able to write a simple script on a remote |
270 | machine that rsyncs it to my workstation, edits it there, then | |
271 | pushes it back up. | |
4f69fe59 MP |
272 | |
273 | ||
274 | Pie-in-the-sky features: | |
275 | ||
276 | These might have a severe impact on the protocol, and are not | |
277 | clearly in our core requirements. It looks like in many of them | |
278 | having scripting hooks will allow us | |
279 | ||
280 | - Transport over UDP multicast. The hard part is handling multiple | |
281 | destinations which have different basis files. We can look at | |
282 | multicast-TFTP for inspiration. | |
283 | ||
284 | - Conflict resolution. Possibly general scripting support will be | |
285 | sufficient. | |
286 | ||
287 | - Integrate with locking. It's hard to see a good general solution, | |
288 | because Unix systems have several locking mechanisms, and grabbing | |
289 | the lock from programs that don't expect it could cause deadlocks, | |
290 | timeouts, or other problems. Scripting support might help. | |
291 | ||
292 | - Replicate in place, rather than to a temporary file. This is | |
293 | dangerous in the case of interruption, and it also means that the | |
294 | delta can't refer to blocks that have already been overwritten. | |
295 | On the other hand we could semi-trivially do this at first by | |
296 | simply generating a delta with no copy instructions. | |
297 | ||
298 | - Replicate block devices. Most of the difficulties here are to do | |
299 | with replication in place, though on some systems we will also | |
300 | have to do I/O on block boundaries. | |
301 | ||
3c6cd53b MP |
302 | - Peer to peer features. Flavour of the year. Can we think about |
303 | ways for clients to smoothly and voluntarily become servers for | |
304 | content they receive? | |
305 | ||
a24e12e6 MP |
306 | - Imagine a situation where the destination has a much faster link |
307 | to the cloud than the source. In this case, Mojo Nation downloads | |
308 | interleaved blocks from several slower servers. The general | |
309 | situation might be a way for a master rsync process to farm out | |
310 | tasks to several subjobs. In this particular case they'd need | |
311 | different sockets. This might be related to multicast. | |
312 | ||
3c6cd53b MP |
313 | |
314 | Unlikely features: | |
315 | ||
316 | - Allow remote source and destination. If this can be cleanly | |
317 | designed into the protocol, perhaps with the remote machine acting | |
318 | as a kind of echo, then it's good. It's uncommon enough that we | |
319 | don't want to shape the whole protocol around it, though. | |
320 | ||
321 | In fact, in a triangle of machines there are two possibilities: | |
322 | all traffic passes from remote1 to remote2 through local, or local | |
323 | just sets up the transfer and then remote1 talks to remote2. FTP | |
324 | supports the second but it's not clearly good. There are some | |
325 | security problems with being able to instruct one machine to open | |
326 | a connection to another. | |
327 | ||
4f69fe59 MP |
328 | |
329 | In favour of evolving the protocol: | |
330 | ||
331 | - Keeping compatibility with existing rsync servers will help with | |
332 | adoption and testing. | |
333 | ||
334 | - We should at the very least be able to fall back to the new | |
335 | protocol. | |
336 | ||
337 | - Error handling is not so good. | |
338 | ||
339 | ||
340 | In favour of using a new protocol: | |
341 | ||
342 | - Maintaining compatibility might soak up development time that | |
343 | would better go into improving a new protocol. | |
344 | ||
345 | - If we start from scratch, it can be documented as we go, and we | |
346 | can avoid design decisions that make the protocol complex or | |
347 | implementation-bound. | |
348 | ||
349 | ||
350 | Error handling: | |
351 | ||
352 | - Errors should come back reliably, and be clearly associated with | |
353 | the particular file that caused the problem. | |
354 | ||
355 | - Some errors ought to cause the whole transfer to abort; some are | |
356 | just warnings. If any errors have occurred, then rsync ought to | |
357 | return an error. | |
358 | ||
359 | ||
360 | Concurrency: | |
361 | ||
362 | - We want to keep the CPU, filesystem, and network as full as | |
363 | possible as much of the time as possible. | |
364 | ||
365 | - We can do nonblocking network IO, but not so for disk. | |
366 | ||
367 | - It makes sense to on the destination be generating signatures and | |
368 | applying patches at the same time. | |
369 | ||
370 | - Can structure this with nonblocking, threads, separate processes, | |
371 | etc. | |
372 | ||
373 | ||
374 | Uses: | |
375 | ||
376 | - Mirroring software distributions: | |
377 | ||
378 | - Synchronizing laptop and desktop | |
379 | ||
380 | - NFS filesystem migration/replication. See | |
381 | http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764 | |
382 | ||
383 | - Sync with PDA | |
384 | ||
385 | - Network backup systems | |
386 | ||
387 | - CVS filemover | |
388 | ||
389 | ||
390 | Conflict resolution: | |
391 | ||
392 | - Requires application-specific knowledge. We want to provide | |
393 | policy, rather than mechanism. | |
394 | ||
395 | - Possibly allowing two-way migration across a single connection | |
396 | would be useful. | |
397 | ||
398 | ||
3c6cd53b | 399 | Moved files: <http://rsync.samba.org/cgi-bin/rsync.fom?file=44> |
4f69fe59 MP |
400 | |
401 | - There's no trivial way to detect renamed files, especially if they | |
402 | move between directories. | |
403 | ||
404 | - If we had a picture of the remote directory from last time on | |
405 | either machine, then the inode numbers might give us a hint about | |
406 | files which may have been renamed. | |
407 | ||
408 | - Files that are renamed and not modified can be detected by | |
409 | examining the directory listing, looking for files with the same | |
410 | size/date as the origin. | |
411 | ||
412 | ||
413 | Filesystem migration: | |
414 | ||
3c6cd53b MP |
415 | NFSv4 probably wants to migrate file locks, but that's not really |
416 | our problem. | |
417 | ||
418 | ||
419 | Atomic updates: | |
420 | ||
4f69fe59 MP |
421 | The NFSv4 working group wants atomic migration. Most of the |
422 | responsibility for this lies on the NFS server or OS. | |
423 | ||
424 | If migrating a whole tree, then we could do a nearly-atomic rename | |
425 | at the end. This ties in to having separate basis and destination | |
426 | files. | |
427 | ||
3c6cd53b MP |
428 | There's no way in Unix to replace a whole set of files atomically. |
429 | However, if we get them all onto the destination machine and then do | |
430 | the updates quickly it would greatly reduce the window. | |
4f69fe59 MP |
431 | |
432 | ||
433 | Scalability: | |
434 | ||
435 | We should aim to work well on machines in use in a year or two. | |
436 | That probably means transfers of many millions of files in one | |
437 | batch, and gigabytes or terabytes of data. | |
438 | ||
439 | For argument's sake: at the low end, we want to sync ten files for a | |
440 | total of 10kb across a 1kB/s link. At the high end, we want to sync | |
441 | 1e9 files for 1TB of data across a 1GB/s link. | |
442 | ||
443 | On the whole CPU usage is not normally a limiting factor, if only | |
444 | because running over SSH burns a lot of cycles on encryption. | |
445 | ||
3c6cd53b MP |
446 | Perhaps have resource throttling without relying on rlimit. |
447 | ||
4f69fe59 MP |
448 | |
449 | Streaming: | |
450 | ||
451 | A big attraction of rsync is that there are few round-trip delays: | |
452 | basically only one to get started, and then everything is | |
453 | pipelined. This is a problem with FTP, and NFS (at least up to | |
454 | v3). NFSv4 can pipeline operations, but building on that is | |
455 | probably a bit complicated. | |
3c6cd53b MP |
456 | |
457 | ||
458 | Related work: | |
459 | ||
460 | - mirror.pl http://freshmeat.net/project/mirror/ | |
461 | ||
462 | - ProFTPd | |
463 | ||
464 | - Apache | |
465 | ||
466 | - http://freshmeat.net/search/?site=Freshmeat&q=mirror§ion=projects | |
467 | ||
468 | - BitTorrent -- p2p mirroring | |
469 | http://bitconjurer.org/BitTorrent/ |