| 1 | -*- indented-text -*- |
| 2 | |
| 3 | Notes towards a new version of rsync |
| 4 | Martin Pool <mbp@samba.org> |
| 5 | |
| 6 | |
| 7 | Good things about the current implementation: |
| 8 | |
| 9 | - Widely known and adopted. |
| 10 | |
| 11 | - Fast/efficient, especially for moderately small sets of files over |
| 12 | slow links (transoceanic or modem.) |
| 13 | |
| 14 | - Fairly reliable. |
| 15 | |
| 16 | - The choice of runnning over a plain TCP socket or tunneling over |
| 17 | ssh. |
| 18 | |
| 19 | - rsync operations are idempotent: you can always run the same |
| 20 | command twice to make sure it worked properly without any fear. |
| 21 | (Are there any exceptions?) |
| 22 | |
| 23 | - Small changes to files cause small deltas. |
| 24 | |
| 25 | - There is a way to evolve the protocol to some extent. |
| 26 | |
| 27 | - rdiff and rsync --write-batch allow generation of standalone patch |
| 28 | sets. rsync+ is pretty cheesy, though. xdelta seems cleaner. |
| 29 | |
| 30 | - Process triangle is creative, but seems to provoke OS bugs. |
| 31 | |
| 32 | - "Morning-after property": you don't need to know anything on the |
| 33 | local machine about the state of the remote machine, or about |
| 34 | transfers that have been done in the past. |
| 35 | |
| 36 | - You can easily push or pull simply by switching the order of |
| 37 | files. |
| 38 | |
| 39 | |
| 40 | Bad things about the current implementation: |
| 41 | |
| 42 | - Persistent and hard-to-diagnose hang bugs remain |
| 43 | |
| 44 | - Protocol is sketchily documented, tied to this implementation, and |
| 45 | hard to modify/extend |
| 46 | |
| 47 | - Both the program and the protocol assume a single non-interactive |
| 48 | one-way transfer |
| 49 | |
| 50 | - A list of all files are held in memory for the entire transfer, |
| 51 | which cripples scalability to large file trees |
| 52 | |
| 53 | - Opening a new socket for every operation causes problems, |
| 54 | especially when running over SSH with password authentication. |
| 55 | |
| 56 | - Renamed files are not handled: the old file is removed, and the |
| 57 | new file created from scratch. |
| 58 | |
| 59 | - The versioning approach assumes that future versions of the |
| 60 | program know about all previous versions, and will do the right |
| 61 | thing. |
| 62 | |
| 63 | - People always get confused about ':' vs '::' |
| 64 | |
| 65 | - Error messages can be cryptic. |
| 66 | |
| 67 | |
| 68 | Protocol philosophy: |
| 69 | |
| 70 | *The* big difference between protocols like HTTP, FTP, and NFS is |
| 71 | that their fundamental operations are "read this file", "delete |
| 72 | this file", and "make this directory", whereas rsync is "make this |
| 73 | directory like this one". |
| 74 | |
| 75 | |
| 76 | Questionable features: |
| 77 | |
| 78 | These are neat, but not necessarily clean or worth preserving. |
| 79 | |
| 80 | - The remote rsync can be wrapped by some other program, such as in |
| 81 | tridge's rsync-mail scripts. The general feature of sending and |
| 82 | retrieving mail over rsync is good, but this is perhaps not the |
| 83 | right way to implement it. |
| 84 | |
| 85 | |
| 86 | Desirable features: |
| 87 | |
| 88 | These don't really require architectural changes; they're just |
| 89 | something to keep in mind. |
| 90 | |
| 91 | - Synchronize ACLs and extended attributes |
| 92 | |
| 93 | - Anonymous servers should be efficient |
| 94 | |
| 95 | - Code should be portable to non-UNIX systems |
| 96 | |
| 97 | - Should be possible to document the protocol in RFC form |
| 98 | |
| 99 | - --dry-run option |
| 100 | |
| 101 | - IPv6 support. Pretty straightforward. |
| 102 | |
| 103 | - Allow the basis and destination files to be different. For |
| 104 | example, you could use this when you have a CD-ROM and want to |
| 105 | download an updated image onto a hard drive. |
| 106 | |
| 107 | - Efficiently interrupt and restart a transfer. We can write a |
| 108 | checkpoint file that says where we're up to in the filesystem. |
| 109 | Alternatively, as long as transfers are idempotent, we can just |
| 110 | restart the whole thing. [NFSv4] |
| 111 | |
| 112 | - Scripting support. |
| 113 | |
| 114 | - Propagate atimes and do not modify them. This is very ugly on |
| 115 | Unix. It might be better to try to add O_NOATIME to kernels, and |
| 116 | call that. |
| 117 | |
| 118 | - VFS. Useful? |
| 119 | |
| 120 | - Unicode. Probably just use UTF-8 for everything. |
| 121 | |
| 122 | |
| 123 | Hard links: |
| 124 | |
| 125 | At the moment, we can recreate hard links, but it's a bit |
| 126 | inefficient: it depends on holding a list of all files in the tree. |
| 127 | Every time we see a file with a linkcount >1, we need to search for |
| 128 | another known name that has the same (fsid,inum) tuple. We could do |
| 129 | that more efficiently by keeping a list of only files with |
| 130 | linkcount>1, and removing files from that list as all their names |
| 131 | become known. |
| 132 | |
| 133 | |
| 134 | Scripting issues: |
| 135 | |
| 136 | - Perhaps support multiple scripting languages: candidates include |
| 137 | Perl, Python, Tcl, Scheme (guile?), sh, ... |
| 138 | |
| 139 | - Simply running a subprocess and looking at its stdout/exit code |
| 140 | might be sufficient, though it could also be pretty slow if it's |
| 141 | called often. |
| 142 | |
| 143 | - There are security issues about running remote code, at least if |
| 144 | it's not running in the users own account. So we can either |
| 145 | disallow it, or use some kind of sandbox system. |
| 146 | |
| 147 | |
| 148 | Scripting hooks: |
| 149 | |
| 150 | - Whether to transfer a file |
| 151 | |
| 152 | - What basis file to use |
| 153 | |
| 154 | - Logging |
| 155 | |
| 156 | - Whether to allow transfers (for public servers) |
| 157 | |
| 158 | - Authentication |
| 159 | |
| 160 | - Locking |
| 161 | |
| 162 | |
| 163 | Interactive interface: |
| 164 | |
| 165 | - Something like ncFTP, or integration into GNOME-vfs. Probably |
| 166 | hold a single socket connection open. |
| 167 | |
| 168 | - Can either call us as a separate process, or as a library. |
| 169 | |
| 170 | - The standalone process needs to produce output in a form easily |
| 171 | digestible by a calling program, like the --emacs feature some |
| 172 | have. |
| 173 | |
| 174 | - Yow! emacs support. (You could probably build that already, of |
| 175 | course.) |
| 176 | |
| 177 | |
| 178 | Pie-in-the-sky features: |
| 179 | |
| 180 | These might have a severe impact on the protocol, and are not |
| 181 | clearly in our core requirements. It looks like in many of them |
| 182 | having scripting hooks will allow us |
| 183 | |
| 184 | - Transport over UDP multicast. The hard part is handling multiple |
| 185 | destinations which have different basis files. We can look at |
| 186 | multicast-TFTP for inspiration. |
| 187 | |
| 188 | - Conflict resolution. Possibly general scripting support will be |
| 189 | sufficient. |
| 190 | |
| 191 | - Integrate with locking. It's hard to see a good general solution, |
| 192 | because Unix systems have several locking mechanisms, and grabbing |
| 193 | the lock from programs that don't expect it could cause deadlocks, |
| 194 | timeouts, or other problems. Scripting support might help. |
| 195 | |
| 196 | - Replicate in place, rather than to a temporary file. This is |
| 197 | dangerous in the case of interruption, and it also means that the |
| 198 | delta can't refer to blocks that have already been overwritten. |
| 199 | On the other hand we could semi-trivially do this at first by |
| 200 | simply generating a delta with no copy instructions. |
| 201 | |
| 202 | - Replicate block devices. Most of the difficulties here are to do |
| 203 | with replication in place, though on some systems we will also |
| 204 | have to do I/O on block boundaries. |
| 205 | |
| 206 | |
| 207 | In favour of evolving the protocol: |
| 208 | |
| 209 | - Keeping compatibility with existing rsync servers will help with |
| 210 | adoption and testing. |
| 211 | |
| 212 | - We should at the very least be able to fall back to the new |
| 213 | protocol. |
| 214 | |
| 215 | - Error handling is not so good. |
| 216 | |
| 217 | |
| 218 | In favour of using a new protocol: |
| 219 | |
| 220 | - Maintaining compatibility might soak up development time that |
| 221 | would better go into improving a new protocol. |
| 222 | |
| 223 | - If we start from scratch, it can be documented as we go, and we |
| 224 | can avoid design decisions that make the protocol complex or |
| 225 | implementation-bound. |
| 226 | |
| 227 | |
| 228 | Error handling: |
| 229 | |
| 230 | - Errors should come back reliably, and be clearly associated with |
| 231 | the particular file that caused the problem. |
| 232 | |
| 233 | - Some errors ought to cause the whole transfer to abort; some are |
| 234 | just warnings. If any errors have occurred, then rsync ought to |
| 235 | return an error. |
| 236 | |
| 237 | |
| 238 | Concurrency: |
| 239 | |
| 240 | - We want to keep the CPU, filesystem, and network as full as |
| 241 | possible as much of the time as possible. |
| 242 | |
| 243 | - We can do nonblocking network IO, but not so for disk. |
| 244 | |
| 245 | - It makes sense to on the destination be generating signatures and |
| 246 | applying patches at the same time. |
| 247 | |
| 248 | - Can structure this with nonblocking, threads, separate processes, |
| 249 | etc. |
| 250 | |
| 251 | |
| 252 | Uses: |
| 253 | |
| 254 | - Mirroring software distributions: |
| 255 | |
| 256 | - Synchronizing laptop and desktop |
| 257 | |
| 258 | - NFS filesystem migration/replication. See |
| 259 | http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764 |
| 260 | |
| 261 | - Sync with PDA |
| 262 | |
| 263 | - Network backup systems |
| 264 | |
| 265 | - CVS filemover |
| 266 | |
| 267 | |
| 268 | Conflict resolution: |
| 269 | |
| 270 | - Requires application-specific knowledge. We want to provide |
| 271 | policy, rather than mechanism. |
| 272 | |
| 273 | - Possibly allowing two-way migration across a single connection |
| 274 | would be useful. |
| 275 | |
| 276 | |
| 277 | Moved files: |
| 278 | |
| 279 | - There's no trivial way to detect renamed files, especially if they |
| 280 | move between directories. |
| 281 | |
| 282 | - If we had a picture of the remote directory from last time on |
| 283 | either machine, then the inode numbers might give us a hint about |
| 284 | files which may have been renamed. |
| 285 | |
| 286 | - Files that are renamed and not modified can be detected by |
| 287 | examining the directory listing, looking for files with the same |
| 288 | size/date as the origin. |
| 289 | |
| 290 | |
| 291 | Filesystem migration: |
| 292 | |
| 293 | The NFSv4 working group wants atomic migration. Most of the |
| 294 | responsibility for this lies on the NFS server or OS. |
| 295 | |
| 296 | If migrating a whole tree, then we could do a nearly-atomic rename |
| 297 | at the end. This ties in to having separate basis and destination |
| 298 | files. |
| 299 | |
| 300 | NFSv4 probably wants to migrate file locks, but that's not really |
| 301 | our problem. |
| 302 | |
| 303 | |
| 304 | Scalability: |
| 305 | |
| 306 | We should aim to work well on machines in use in a year or two. |
| 307 | That probably means transfers of many millions of files in one |
| 308 | batch, and gigabytes or terabytes of data. |
| 309 | |
| 310 | For argument's sake: at the low end, we want to sync ten files for a |
| 311 | total of 10kb across a 1kB/s link. At the high end, we want to sync |
| 312 | 1e9 files for 1TB of data across a 1GB/s link. |
| 313 | |
| 314 | On the whole CPU usage is not normally a limiting factor, if only |
| 315 | because running over SSH burns a lot of cycles on encryption. |
| 316 | |
| 317 | |
| 318 | Streaming: |
| 319 | |
| 320 | A big attraction of rsync is that there are few round-trip delays: |
| 321 | basically only one to get started, and then everything is |
| 322 | pipelined. This is a problem with FTP, and NFS (at least up to |
| 323 | v3). NFSv4 can pipeline operations, but building on that is |
| 324 | probably a bit complicated. |