Status: intermittently active, parts obsolete; 2005-present
Here I collect utilities I use or previously used on my Linux system. Some of them may work on other unix-like systems. They vary widely in maturity: caveat emptor! Your feedback and contributions are welcome: please email me.
Since 2020-09-02, the actual files are in this git repository for your convenience in keeping your copies up to date and merging modifications, except as indicated on this page. The main documentation is still maintained on this page, although many of the files also have comments in them.
lesspipe.sh
script,
based on an old version of Fedora's.
It supports a number of additional file types,
but I haven't kept it up to date with Fedora's.
Better options may be available on the web;
I wanted to get this out and didn't want to take the time to research other options.I use these tools to help manage my traditional Fedora system on which
RPM transactions mutate the root filesystem over time.
They have some limitations and make some assumptions specific to my setup,
but the latter should be easy to change.
I'd like to move to a system that generates the root filesystem reproducibly
from a specification, likely reusing code or at least ideas from
Fedora Silverblue
(Silverblue doesn't appear to do everything I need out of the box),
which would make these tools obsolete. In the meantime, rpm-audit
keeps at least the set of installed packages reproducible (with minor caveats).
With the packaged versions of configuration files saved by rpmconf-matt
,
it wouldn't be hard to write a tool that diffs the root filesystem against the
packaged state, though I haven't done so yet. (If you do, let me know!)
However, the problem remains that scriptlets mutate things from their packaged state,
so one has to know what changes from scriptlets are expected (or what expected changes
from scriptlets are missing!). Efficient re-running of scriptlets (ideally incremental,
if there's any way to achieve that) is a major problem that any tool for
reproducible generation of the root filesystem would have to solve.
rpmconf-matt
: Copies the original versions of RPM-managed configuration files to
*.rpmbase
and invokes a three-way merge tool to resolve *.rpmsave
and *.rpmnew
files.
(Read the security notice at the top of the file before using.)
This has largely removed what for many years was a huge pain of
comparing two versions of a configuration file and researching which version to keep
at each difference, though there are still cases in which a scriptlet changes something in a
configuration file and I have to research whether I should keep the change.
The tool is named rpmconf-matt
to distinguish it from
the rpmconf
in Fedora,
which does not support three-way merges and thus provides little benefit in my view.
I had hoped to get the three-way merge functionality cleaned up and added to rpmconf
and thereby available in Fedora, but I haven't gotten around to it.
Apparently dpkg has an analogous tool
ucf --three-way
,
and some other package managers may as well.rpm-overrides-matt
: Same idea as rpmconf-matt
for when I need
to modify an RPM-managed file that is not marked as configuration and thus is just overwritten
by RPM when the package is upgraded. Less mature than rpmconf-matt
.
On a dpkg-based system, a diversion
would stop dpkg from overwriting the file,
but a three-way merge tool would still be needed if the administrator needs to actually
merge changes rather than just reference the original file via an include or the like.rpm-audit
: Verifies that the sets of installed and "userinstalled" (per dnf)
packages match what one would get by installing all
the packages and provides listed in a wants file on an empty system. The simple system-update
script brings together all three tools
to do a complete system upgrade, audit, and file merge.
If you want to install some packages from an updates-testing
or similar repository (or even a different Fedora release with dnf --releasever=X
) without
permanently enabling the whole repository, you'll need to dnf download
the packages to a permanently enabled local repository in order to pass rpm-audit
.
See the next subsection for a consideration about doing that.
If you copy some RPMs from a repository you don't enable by default on your system to your own repository,
it's best to copy complete sets of packages built from given SRPMs to ensure that you don't
accidentally install mismatching RPMs if you forget to add another RPM built from the same SRPM to your repository
before installing that package name.
Unfortunately, as of this writing (2020-09-18), dnf repoquery
does not have a built-in option to list all RPMs built from given SRPMs,
but you can use this dnf-repoquery-by-srpm
tool instead.
So the command to download all packages built from given SRPMs to the current directory would be something like:
dnf download [OPTIONS] $(dnf-repoquery-by-srpm [OPTIONS] N-V-R.src.rpm [N2-V2-R2.src.rpm ...])
Note that options like --enablerepo
or --releasever
must be passed in two places.
Alternatively, if you want an entire Fedora update (which may include RPMs
built from more than one SRPM), you can use dnf updateinfo --list
, but many
third-party repositories do not have an analogous update system that publishes metadata for
dnf updateinfo
.
If you want to build custom RPMs locally, I highly recommend
Mock
(available in Fedora in the mock
package).
It builds in an isolated environment, helping you achieve functional reproducibility.
(Note that achieving bit-for-bit reproducibility
often requires much more legwork. Also, Mock is not designed to protect your system from malicious build inputs;
I recommend using a separate Qubes OS VM for that.)
However, an obstacle you're likely to encounter, at least in Fedora, is that by default,
Mock builds against a standard dnf configuration for the Fedora repositories.
You'd probably prefer to build against your own system's dnf repository configuration,
which might include third-party repositories or even your own previously built custom RPMs.
For RPM source repositories formatted like the official Fedora ones,
you can achieve this with this Mock configuration file,
which should be placed in /etc/mock
.
You can select it by passing -r host
to Mock
or by symlinking /etc/mock/default.cfg
to host.cfg
.
If you are using fedpkg mockbuild
, you must pass --mock-config host
,
since fedpkg
's default choice of Mock configuration is based on the package rather than the system default.
In the past, I've used a similar Mock configuration to build custom Qubes OS RPMs for VMs rather than using
the official qubes-builder
,
which works somewhat differently.
If and when I need that configuration again and bring it up to date, I will post it here.
(Building RPMs for dom0 requires a different process since your VM repository configuration won't match dom0's.)
As a reminder, in addition to using one of these configuration files,
you probably want to set the following in your ~/.config/mock.cfg
:
config_opts['macros']['%packager'] = 'YOUR_NAME <YOUR_EMAIL>'
Here are the tools I use to analyze the server logs for this web site to understand how people are using the site and prioritize improvements, since a quick web search didn't find another tool that did what I wanted. Notable features:
generated/requests
) that groups requests by HTTP response status code (helping to separate both site problems and obvious abuse from successful requests in order to more easily spot site problems and avoid distraction from abuse while reviewing successful requests), then by request URL (sorted by descending count), then by referrer (sorted by descending count). Among requests from a given client IP address, consecutive identical requests (e.g., successive range requests for a streaming video) are counted as one.generated/logs-by-ip
) so you can see the sequence of requests from one IP address to understand how the user navigated the site.
(I know an IP address is an imperfect indicator of a user,
but I haven't bothered to set a cookie,
and my current web host doesn't give me an easy way to include cookies in the access log anyway.)Sample overview output (some portions omitted and comments added):
=== STATUS CODE 200 === 86 "GET /site/style.css" 64 "https://mattmccutchen.net/bigint/" 9 "https://mattmccutchen.net/" ... 77 "GET /bigint/" 40 "-" 23 "https://www.google.com/" ... # More concise output if all requests for the same URL have the same referrer 4 "GET /escape/icon.png" "https://mattmccutchen.net/escape/" ... === STATUS CODE 404 === ... 2 "GET //wordpress/wp-includes/wlwmanifest.xml" "-" # abuse ... # Oops... site misconfiguration fixed on 2020-09-16 1 "GET /app-downloads/escapesetup-windows-201609050.mattmccutchen202008280.exe" "https://mattmccutchen.net/escape/" ...
The obvious site-specific parameters are taken from a separate configuration file, but there are plenty of other assumptions specific to my setup that you may need to change in order to use this.
My modified version of Stow 1.3.3 (very old) with the following enhancements:
stow
; should probably be updated to check for a .stow
file. May be worth upstreaming.update-mime-database
and the like. May be worth upstreaming..dontfold
file that ensures its containing directory is unfolded even if that package is the only one contributing content. Back in 2009, I converted Fedora's filesystem
package to a Stow package of .dontfold
files to help ensure that an accidental make install
into the target directory wouldn't create files in folded directories of random packages, which would be difficult to track down and remove. However, I've never had the motivation to keep this Stow package up to date, so I'm thinking upstream's new --no-folding
option may be better even if it's expensive.retex is a TeX wrapper script that makes TeX compiling fit more nicely into build processes. For example, it exits nonzero immediately if an error occurs, and it repeats until a fixed point is reached in order to handle LaTeX references correctly. Better tools may exist.
ftc packages a file tree in a single file of a simplistic format that I designed, and ftx extracts such package files. They have the same purposes as tar -c and tar -x respectively but have no bells or whistles. They handle binary files safely, but a package of only text files is itself a text file.
gitar ("git archive") uses the git backend to make really small packages out of file trees with lots of redundancy. ungitar unpacks the packages; it requires ftx.
A .gitar package consists of an ftc package containing a bare git repository whose HEAD is the original tree and whose objects are all stored in a single pack. Hence, git will represent similar files in the original tree as deltas.
gitar is great for compressing together several versions of the same piece of software. I had seven versions of my custom rsync lying around, each about 585 KB as a tar-bz2 package. I unpacked them all inside a single folder and gitared that folder; the resulting gitar package was only 865 KB. Of course, if you can be bothered to import the sequence of versions into git as a proper sequence of commits, that's much better.
Note (2008-06-01): A while ago, git gained a standardized binary format for "bundles"; I should change gitar and ungitar to use bundles rather than my ftc format.
patchsync synchronizes a trunk, a branch, and a patch that contains the differences between the two. If the trunk or patch changes, it updates the branch; if the branch changes, it updates the patch. I developed patchsync to help me follow branches of rsync, but I no longer use it for that purpose. Depending on your situation, you may prefer a more sophisticated patch-management tool such as StGIT.
To set up a patchsync staging directory, run:
patchsync --new trunk patch branch where-to-create-staging
Then, to synchronize, run:
patchsync staging
Read the gigantic comment at the top of patchsync for much more information.
From 2006.12.16 to 2006.12.24, a development version of patchsync was mistakenly identified as version 2.2. As of 2006.12.24, the real version 2.2 is posted.
xsltdepcomp (named after depcomp) runs an XSL transform with xsltproc and generates a dependency makefile for the files read, thanks to xsltproc --load-trace. It is used in the build system for this web site.
continusync is a perl script around inotifywait and rsync that performs continuous mirroring, as suggested by Buck Huppmann. It is currently experimental and rather inefficient, but it does appear to work in simple cases. If you want to use it, I would be much obliged if you improved it as necessary and sent me the improved version.
firefox-isolated is a Firefox wrapper script that creates and uses a disposable profile. You can make it harder for people to correlate your activities across multiple Web sites by browsing each site with a separate Firefox profile created by this script. This script was inspired by the Facebook Beacon outrage.
To use this script, install it in your $PATH and name your master Firefox profile (from which the disposable ones will be copied) 00000000.master; then run firefox-isolated. Your mileage may vary.
Last update 2007-12-02: Initial posting. Seems to work.
As of 2020-09-02, Firefox containers are much more convenient to use, although it's possible there are some Firefox features (perhaps some of these?) that containers and the like do not yet properly isolate but a separate profile would.
ntfsresizecopy copies an NTFS filesystem from one block device to another, resizing it to the size of the destination device in the process. (It uses ntfsprogs.) This is EXPERIMENTAL; after using this script, you should mount the destination read-only and check that everything looks intact.
An expanding copy is just done with ntfsclone followed by ntfsresize. A shrinking copy is done by running ntfsclone and ntfsresize on devices specially crafted with the Linux device-mapper (requires dmsetup and losetup); you may save time by checking first that the shrinkage is possible with `ntfsresize -n -s SIZE SRC'.
The special shrinking technique should be applicable to any filesystem type that has an in-place shrinking command that doesn't write outside the new size. Just change the calls to ntfsclone and ntfsresize; ntfsclone can be replaced by a dd of the beginning of the source for filesystems that don't have a sparse clone command.
git subtree-lite is a tool to manage modified versions of content imported from other git repositories, now deprecated in favor of Braid, which is roughly equivalent but more mature. The source repository remains available for historical interest.