Matt McCutchen's Filesystem Enhancements
========================================

The filesystem support in my custom kernel has a number of enhancements that
address what I consider to be deficiencies in functionality and security.

- Traversing sticky directories
- Creating entries in sticky directories
- Hard-linking others' files
- Moving others' directories
- Meaningful symlink permissions
- lchmod
- lutimes
- writelink (not implemented)
- Userspace support

There is a section below documenting each.  Finally there is a section on
userspace support.

Traversing sticky directories
-----------------------------
Using this kernel, to access a file in someone else's sticky directory, you
must own the target file or have some permission (r, w, or x) on it.  If you
try to look up a file that isn't yours and on which you have no permission,
you get EPERM.  Readdir still returns all directory entries (and on some
filesystems it gives you the files' i-numbers and types); in a future version
of the custom kernel, readdir may omit entries you aren't allowed to traverse.

If you have only execute permission on a directory, the nonexistence of files
in the directory is concealed: if you try to look up a nonexistent file, you
get EPERM instead of ENOENT.  If you have read permission on the directory,
you can list it with readdir and see whether a given file name is listed,
while if you have write permission, you can attempt to create a new file with
that name and see whether you get EACCES because of an existing, unwritable
file.  Concealing a file's nonexistence would be pointless in either of these
cases, so you get ENOENT if you have read and/or write permission in addition
to execute.

Why is this behavior useful?  If you want to let everyone read a folder in
your home directory, you can set its permissions to 755.  To get to the
folder, however, people need execute on your home directory, and giving them
execute invites lots of abuse.  People can guess filenames and see if those
files exist in your home directory; if the files exist, people can stat them
and learn their atimes, mtimes, and sizes.  Maybe they won't hit upon any of
your personal files, but the names of your mailbox and your dotfiles are
likely to be well-known.  People can find out when you last got mail and how
big the mail was (if they've been watching the size of your mailbox), what
programs have written configuration files recently, and so forth.

If you use my kernel, you can stop the abuse by making your home directory
sticky (mode 1711).  People can still get to the public folder, but trying to
access any other name will give them EPERM.  They can't even learn whether
files of given names exist, let alone stat them.

Alternatively, you can set your home directory to mode 1755, in which case
others can list the names of all files but only see stat information for the
public ones.  If ls fails to stat a file, it shows question marks for the
file's attributes and gives the name a red background.  Maybe you've already
seen this if you've listed a directory of the silly mode 600.

Here's what another user's home directory might look like:
        drwxr-xr-t  81 matt matt 4096 Jan 18 16:13 .
        drwxr-xr-x   3 root root 4096 Jan  8 16:46 ..
        ?---------   ? ?    ?       ?            ? .bashrc
        drwxr-xr-x   1 matt matt   18 Jan 18 16:14 public
        ?---------   ? ?    ?       ?            ? private

If you use qmail, making your home directory sticky will tell qmail to hold
your email.  I recommend you modify qmail to use the setuid bit instead of the
sticky bit to hold email since setuid on directories currently does nothing.
I created such a modified qmail for my computer.

Creating entries in sticky directories
--------------------------------------
In the standard kernel, linking or moving someone else's file into someone
else's sticky directory is legal but irreversible.  My kernel forbids this.
Just as it doesn't let you delete an entry for someone else's file from
someone else's sticky directory, it doesn't let you create such an entry.

Hard-linking others' files
--------------------------
My school's Linux server hosts a number of Web sites for various school clubs.
The members of each club belong to a group with permission to write to their
Web site.  This was set up long before default ACLs were available, so the
sysadmins needed a way to allow people to write to files added to Web sites by
other people with restrictive umasks.  So they wrote a cron job that would
forcibly "chmod -R g+w" on each Web site.  They were lucky that nobody was
devious enough to hard-link /etc/passwd into a Web site, let its group-write
bit get turned on, and compromise the system.  One can avoid this scenario by
restricting hard linking.

This kernel only lets you create a hard link to a file you don't own if you
"control" the directory entry by which you name the file, meaning that
stickiness or write permission of the containing directory do not prevent you
from deleting the entry.  In 99% of cases, this means you can hard link to a
file if and only if you can move it.  Immutable and append-only attributes on
the containing directory affect moving the file but not hard-linking it.

The upshot is that the only risk you take by running a recursive permission
resetter is that a user who controls a directory entry to one of your files
may be able to cause that file's permissions to change to the new value you
are applying.  If every file of yours whose directory entry is controlled by
someone else also grants that person read and write permission, this risk is
not a problem.  This is almost always the case, but watch out for programs
dumping core in untrusted directories.

It appears to me that most legitimate purposes for hard links to others' files
(e.g., saving disk space) are served equally well by symlinks.

Moving others' directories
--------------------------
At my school, people taking a certain computer class once copied their
programs into the teacher's dropbox on the above-mentioned Linux server.  The
submitted directories got 755 permissions, and the teacher could not delete
them when he was finished grading them!

In general, if you own a directory, you should be able to delete any file from
it.  However, you can only delete a directory if you have enough write
permission on the stuff inside to empty out the directory first.  I considered
having a system-wide trash area to which people can move offending directories
and a cron job to clean out the area as root, but a restriction on rename(2)
ruins this approach: since moving a directory causes its .. entry to change,
you can only move directories that you can write.

My custom kernel lifts the restriction so that a system-wide trash area can be
implemented.  If you wish to implement one, keep in mind that you need a trash
area on each filesystem that is writable by non-root users, and your move-to-
trash command needs to select the trash area on the same filesystem as the
file being trashed.

Meaningful symlink permissions
------------------------------
This is the most exciting of my enhancements, but it is also potentially the
most disruptive.  On reiserfs filesystems mounted with the new "symlink-perms"
mount option, my kernel allows symlinks to have permissions and/or access
ACLs just like other files.  A newly created symlink gets its permissions
from the creator's umask or the directory's default ACL, as usual.
Permissions have the following meanings:

    - readlink requires read permission
    - writelink (not implemented) will require write permission
    - traversal requires execute permission

This way, you can let people access some of the symlinks in a directory but
not others.  Or, if you are using files of secret names in conjunction with
directories that grant only execute permission, you can give others
execute-only symlinks that let them access the files but not learn their
names.  In addition, if you're using a symlink as a convenient miniature text
file, you can make it non-executable so people don't try to follow it.
(Unfortunately, this currently can't be done when you create a link.)

Symlink permissions are a nice complement to the enhanced sticky bit.  Using
earlier versions of the custom kernel, if you opened your home directory to
others (mode 1711), there was still no way to hide symlinks: they always
appeared normally in the file listing.  Now you can hide them by giving them
700 permissions.

My /usr/local/bin has root:wheel ownership, 2775 permissions, and a default
ACL of 775.  Few installers respect the default ACL, so I occasionally have
to fix the permissions of installed files.  Now I can scan down the
permissions column of the directory listing without being distracted by
"lrwxrwxrwx" entries.

Again, symlink permissions are only enforced, initialized as above, and
changeable by users on reiserfs filesystems mounted with the option
"symlink-perms".  On other filesystems, new symlinks get 777 permissions, and
anyone who can stat a symlink can traverse and readlink it.  (The necessary
changes to support symlink permissions are split between the VFS layer and
individual filesystems, and I didn't feel like modifying every single
filesystem implementation, so I modified only reiserfs because it is my
favorite.)  (I stopped enforcing symlink permissions on all filesystems when
it interfered with readlinking /proc/*/fd/* entries for files open for writing
only.)

If you use RPM, I recommend that you do not enable symlink permissions on your
root filesystem because they might confuse RPM verification.  On the other
hand, users would probably like symlink permissions supported on their home
directories.

lchmod
------
This system call changes the permissions on a file without following it if it
happens to be a symlink.  (Plain chmod will change the permissions on the
target of a symlink.)  Of course, you must own the file.  If the file is
indeed a symlink but symlink permissions are disabled on the filesystem, you
get ENOTSUP.

        #313: int lchmod(const char *linkpath, mode_t mode);
        #314: int lchmodat(int relative_to_fd,
                         const char *linkpath, mode_t mode);

By the way, you can change a symlink's access ACL with lsetxattr(2) or
"setfattr -h"; if ACLs and/or symlink permissions are disabled, you get
ENOTSUP.

lutimes
-------
As lchmod is a counterpart to chmod that does not follow symlinks, lutimes is
a counterpart to utimes that does not follow symlinks.  You can use it to
change the atime and mtime of a symlink on any filesystem.  (Originally
symlink times could only be changed on reiserfs, but this capability doesn't
seem too dangerous; a filesystem implementation that really can't handle
changing symlink times should complain in setattr.)  As with utimes, you must
have write permission to set the atime and mtime to the current time, and you
must own the file to set the atime and mtime arbitrarily.

        #315: int lutimes(const char *path, const struct timeval times[2]);
        #316: int lutimesat(int relative_to_fd,
                          const char *path, const struct timeval times[2]);

writelink (not implemented)
---------------------------
My kernel adds two system calls to let you change a symlink's target in-place.
They are not yet implemented; they follow the path to the link but then give
ENOSYS.  Their declarations and system-call ID numbers are as follows:

        #311: int writelink(const char *linkpath, const char *new_target);
        #312: int writelinkat(int relative_to_fd,
                            const char *linkpath, const char *new_target);

I plan to consult the reiserfs people to learn how to implement changing
symlink targets in-place.  If this is practical, I will then add writelink
support for reiserfs, controlled by a mount option "writelink".  You probably
wouldn't want to enable "writelink" on filesystems that still create their
symlinks with 777 permissions.

Userspace support
-----------------
I have made all the necessary changes to the kernel to support the
enhancements described here.  Some, which merely tighten security, show up in
userspace only as additional errors.  Others require corresponding changes to
userspace tools and libraries to be useful.

For example, to call any of the six new system calls, you must either use
syscall(2) and provide the system call number given here or use a customized C
library that knows about the calls.  I have made a customized glibc.

Some command-line utilities could also use enhancement.  Eventually I plan to
customize coreutils to add support for "chmod -h", "touch -h", "getfacl -h",
and "setfacl -h" and to make the "+" indicating nontrivial ACLs appear when it
should on symlinks in "ls -l" output.

In the meantime, I have prepared a small collection of proof-of-concept
userspace utilities that work but are rather inconvenient to use.  There's
lchmod:
        $ lchmod 0700 mylink
        $ lchmod 0775 mylink
There's lutimes:
        $ lutimes myfile 1146518442 0 1146518442 0   # atime{s,ns} mtime{s,ns}
        $ lutimes myfile                             # both to current time
And there's even writelink:
        $ writelink mylink newtarget
        writelink: Function not implemented

--------------
Matt McCutchen
hashproduct@gmail.com
http://kepreon.com/~matt/