Matt McCutchen's Web Site - rsync/rsync.git/blame_incremental

... / ...

Commit	Line	Data
	1	-- indented-text --
	2
	3	URGENT ---------------------------------------------------------------
	4
	5
	6	IMPORTANT ------------------------------------------------------------
	7
	8	Cross-test versions
	9
	10	Part of the regression suite should be making sure that we don't
	11	break backwards compatibility: old clients vs new servers and so
	12	on. Ideally we would test the cross product of versions.
	13
	14	It might be sufficient to test downloads from well-known public
	15	rsync servers running different versions of rsync. This will give
	16	some testing and also be the most common case for having different
	17	versions and not being able to upgrade.
	18
	19	use chroot
	20
	21	If the platform doesn't support it, then don't even try.
	22
	23	If running as non-root, then don't fail, just give a warning.
	24	(There was a thread about this a while ago?)
	25
	26	http://lists.samba.org/pipermail/rsync/2001-August/thread.html
	27	http://lists.samba.org/pipermail/rsync/2001-September/thread.html
	28
	29	--files-from
	30
	31	Avoids traversal. Better option than a pile of --include statements
	32	for people who want to generate the file list using a find(1)
	33	command or a script.
	34
	35	File list structure in memory
	36
	37	Rather than one big array, perhaps have a tree in memory mirroring
	38	the directory tree.
	39
	40	This might make sorting much faster! (I'm not sure it's a big CPU
	41	problem, mind you.)
	42
	43	It might also reduce memory use in storing repeated directory names
	44	-- again I'm not sure this is a problem.
	45
	46	Performance
	47
	48	Traverse just one directory at a time. Tridge says it's possible.
	49
	50	At the moment rsync reads the whole file list into memory at the
	51	start, which makes us use a lot of memory and also not pipeline
	52	network access as much as we could.
	53
	54
	55	Handling duplicate names
	56
	57	We need to be careful of duplicate names getting into the file list.
	58	See clean_flist(). This could happen if multiple arguments include
	59	the same file. Bad.
	60
	61	I think duplicates are only a problem if they're both flowing
	62	through the pipeline at the same time. For example we might have
	63	updated the first occurrence after reading the checksums for the
	64	second. So possibly we just need to make sure that we don't have
	65	both in the pipeline at the same time.
	66
	67	Possibly if we did one directory at a time that would be sufficient.
	68
	69	Alternatively we could pre-process the arguments to make sure no
	70	duplicates will ever be inserted. There could be some bad cases
	71	when we're collapsing symlinks.
	72
	73	We could have a hash table.
	74
	75	The root of the problem is that we do not want more than one file
	76	list entry referring to the same file. At first glance there are
	77	several ways this could happen: symlinks, hardlinks, and repeated
	78	names on the command line.
	79
	80	If names are repeated on the command line, they may be present in
	81	different forms, perhaps by traversing directory paths in different
	82	ways, traversing paths including symlinks. Also we need to allow
	83	for expansion of globs by rsync.
	84
	85	At the moment, clean_flist() requires having the entire file list in
	86	memory. Duplicate names are detected just by a string comparison.
	87
	88	We don't need to worry about hard links causing duplicates because
	89	files are never updated in place. Similarly for symlinks.
	90
	91	I think even if we're using a different symlink mode we don't need
	92	to worry.
	93
	94	Unless we're really clever this will introduce a protocol
	95	incompatibility, so we need to be able to accept the old format as
	96	well.
	97
	98
	99	Memory accounting
	100
	101	At exit, show how much memory was used for the file list, etc.
	102
	103	Also we do a wierd exponential-growth allocation in flist.c. I'm
	104	not sure this makes sense with modern mallocs. At any rate it will
	105	make us allocate a huge amount of memory for large file lists.
	106
	107
	108	Hard-link handling
	109
	110	At the moment hardlink handling is very expensive, so it's off by
	111	default. It does not need to be so.
	112
	113	Since most of the solutions are rather intertwined with the file
	114	list it is probably better to fix that first, although fixing
	115	hardlinks is possibly simpler.
	116
	117	We can rule out hardlinked directories since they will probably
	118	screw us up in all kinds of ways. They simply should not be used.
	119
	120	At the moment rsync only cares about hardlinks to regular files. I
	121	guess you could also use them for sockets, devices and other beasts,
	122	but I have not seen them.
	123
	124	When trying to reproduce hard links, we only need to worry about
	125	files that have more than one name (nlinks>1 && !S_ISDIR).
	126
	127	The basic point of this is to discover alternate names that refer to
	128	the same file. All operations, including creating the file and
	129	writing modifications to it need only to be done for the first name.
	130	For all later names, we just create the link and then leave it
	131	alone.
	132
	133	If hard links are to be preserved:
	134
	135	Before the generator/receiver fork, the list of files is received
	136	from the sender (recv_file_list), and a table for detecting hard
	137	links is built.
	138
	139	The generator looks for hard links within the file list and does
	140	not send checksums for them, though it does send other metadata.
	141
	142	The sender sends the device number and inode with file entries, so
	143	that files are uniquely identified.
	144
	145	The receiver goes through and creates hard links (do_hard_links)
	146	after all data has been written, but before directory permissions
	147	are set.
	148
	149	At the moment device and inum are sent as 4-byte integers, which
	150	will probably cause problems on large filesystems. On Linux the
	151	kernel uses 64-bit ino_t's internally, and people will soon have
	152	filesystems big enough to use them. We ought to follow NFS4 in
	153	using 64-bit device and inode identification, perhaps with a
	154	protocol version bump.
	155
	156	Once we've seen all the names for a particular file, we no longer
	157	need to think about it and we can deallocate the memory.
	158
	159	We can also have the case where there are links to a file that are
	160	not in the tree being transferred. There's nothing we can do about
	161	that. Because we rename the destination into place after writing,
	162	any hardlinks to the old file are always going to be orphaned. In
	163	fact that is almost necessary because otherwise we'd get really
	164	confused if we were generating checksums for one name of a file and
	165	modifying another.
	166
	167	At the moment the code seems to make a whole second copy of the file
	168	list, which seems unnecessary.
	169
	170	We should have a test case that exercises hard links. Since it
	171	might be hard to compare ./tls output where the inodes change we
	172	might need a little program to check whether several names refer to
	173	the same file.
	174
	175	IPv6
	176
	177	Implement suggestions from http://www.kame.net/newsletter/19980604/
	178	and ftp://ftp.iij.ad.jp/pub/RFC/rfc2553.txt
	179
	180	If a host has multiple addresses, then listen try to connect to all
	181	in order until we get through. (getaddrinfo may return multiple
	182	addresses.) This is kind of implemented already.
	183
	184	Possibly also when starting as a server we may need to listen on
	185	multiple passive addresses. This might be a bit harder, because we
	186	may need to select on all of them. Hm.
	187
	188	Define a syntax for IPv6 literal addresses. Since they include
	189	colons, they tend to break most naming systems, including ours.
	190	Based on the HTTP IPv6 syntax, I think we should use
	191
	192	rsync://[::1]/foo/bar
	193	[::1]::bar
	194
	195	which should just take a small change to the parser code.
	196
	197	Errors
	198
	199	If we hang or get SIGINT, then explain where we were up to. Perhaps
	200	have a static buffer that contains the current function name, or
	201	some kind of description of what we were trying to do. This is a
	202	little easier on people than needing to run strace/truss.
	203
	204	"The dungeon collapses! You are killed." Rather than "unexpected
	205	eof" give a message that is more detailed if possible and also more
	206	helpful.
	207
	208	File attributes
	209
	210	Device major/minor numbers should be at least 32 bits each. See
	211	http://lists.samba.org/pipermail/rsync/2001-November/005357.html
	212
	213	Transfer ACLs. Need to think of a standard representation.
	214	Probably better not to even try to convert between NT and POSIX.
	215	Possibly can share some code with Samba.
	216
	217	Empty directories
	218
	219	With the current common --include '/' --exclude '' pattern, people
	220	can end up with many empty directories. We might avoid this by
	221	lazily creating such directories.
	222
	223	zlib
	224
	225	Perhaps don't use our own zlib. Will we actually be incompatible,
	226	or just be slightly less efficient?
	227
	228	logging
	229
	230	Perhaps flush stdout after each filename, so that people trying to
	231	monitor progress in a log file can do so more easily. See
	232	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=48108
	233
	234	At the connections that just get a list of modules are not logged,
	235	but they should be.
	236
	237	rsyncd over ssh
	238
	239	There are already some patches to do this.
	240
	241	proxy authentication
	242
	243	Allow RSYNC_PROXY to be http://user:pass@proxy.foo:3128/, and do
	244	HTTP Basic Proxy-Authentication.
	245
	246	Multiple schemes are possible, up to and including the insanity that
	247	is NTLM, but Basic probably covers most cases.
	248
	249	SOCKS
	250
	251	Add --with-socks, and then perhaps a command-line option to put them
	252	on or off. This might be more reliable than LD_PRELOAD hacks.
	253
	254	PLATFORMS ------------------------------------------------------------
	255
	256	Win32
	257
	258	Don't detach, because this messes up --srvany.
	259
	260	http://sources.redhat.com/ml/cygwin/2001-08/msg00234.html
	261
	262	According to "Effective TCP/IP Programming" (??) close() on a socket
	263	has incorrect behaviour on Windows -- it sends a RST packet to the
	264	other side, which gives a "connection reset by peer" error. On that
	265	platform we should probably do shutdown() instead. However, on Unix
	266	we are correct to call close(), because shutdown() discards
	267	untransmitted data.
	268
	269	DOCUMENTATION --------------------------------------------------------
	270
	271	Update README
	272
	273	BUILD FARM -----------------------------------------------------------
	274
	275	Add machines
	276
	277	AMDAHL UTS (Dave Dykstra)
	278
	279	Cygwin (on different versions of Win32?)
	280
	281	HP-UX variants (via HP?)
	282
	283	SCO
	284
	285	NICE -----------------------------------------------------------------
	286
	287	--no-detach and --no-fork options
	288
	289	Very useful for debugging. Also good when running under a
	290	daemon-monitoring process that tries to restart the service when the
	291	parent exits.
	292
	293	hang/timeout friendliness
	294
	295	verbose output
	296
	297	Indicate whether files are new, updated, or deleted
	298
	299	internationalization
	300
	301	Change to using gettext(). Probably need to ship this for platforms
	302	that don't have it.
	303
	304	Solicit translations.
	305
	306	Does anyone care?
	307
	308	rsyncsh
	309
	310	Write a small emulation of interactive ftp as a Pythonn program
	311	that calls rsync. Commands such as "cd", "ls", "ls *.c" etc map
	312	fairly directly into rsync commands: it just needs to remember the
	313	current host, directory and so on. We can probably even do
	314	completion of remote filenames.
	315
	316	%K%