Matt McCutchen's Web Site → Utilities (Top, Basic file management, RPM software management, Web log analysis, Stow, retex, ftc, ftx, gitar, patchsync, xsltdepcomp, continusync, Isolated Firefox, ntfsresizecopy, git subtree-lite, Bottom). Email me about this page.

Utilities

Status: intermittently active, parts obsolete; 2005-present (supersedes any conflicting remarks left on this page; see the home page for definitions)

Here I collect utilities I use or previously used on my Linux system. Some of them may work on other unix-like systems. They vary widely in maturity: caveat emptor! Your feedback and contributions are welcome: please email me.

Since 2020-09-02, the actual files are in this git repository for your convenience in keeping your copies up to date and merging modifications, except as indicated on this page. The main documentation is still maintained on this page, although many of the files also have comments in them.

Basic file management[# Top]

cp2 is an abbreviation for rsync -rltE --chmod=ugo=rwX, i.e., preserve data but use destination default security settings. cp2 has largely replaced cp as my copy command.
chexec makes a file executable or nonexecutable by changing the appropriate x permissions according to the same smart algorithm I contributed to rsync. cleanexec is a wrapper that makes a file executable if and only if file(1) says the data is executable; you might find find . -type f -exec cleanexec {} \; useful.
stat2 shows just about all information about a file except its data (stat, lsattr, getfacl, getfattr; let me know if I missed something!).
My lesspipe.sh script, based on an old version of Fedora's. It supports a number of additional file types, but I haven't kept it up to date with Fedora's. Better options may be available on the web; I wanted to get this out and didn't want to take the time to research other options.

RPM software management[# Top]

Maintenance of traditional mutable root filesystems

I use these tools to help manage my traditional Fedora system on which RPM transactions mutate the root filesystem over time. They have some limitations and make some assumptions specific to my setup, but the latter should be easy to change. I'd like to move to a system that generates the root filesystem reproducibly from a specification, likely reusing code or at least ideas from Fedora Silverblue (Silverblue doesn't appear to do everything I need out of the box), which would make these tools obsolete. In the meantime, rpm-audit keeps at least the set of installed packages reproducible (with minor caveats). With the packaged versions of configuration files saved by rpmconf-matt, it wouldn't be hard to write a tool that diffs the root filesystem against the packaged state, though I haven't done so yet. (If you do, let me know!) However, the problem remains that scriptlets mutate things from their packaged state, so one has to know what changes from scriptlets are expected (or what expected changes from scriptlets are missing!). Efficient re-running of scriptlets (ideally incremental, if there's any way to achieve that) is a major problem that any tool for reproducible generation of the root filesystem would have to solve.

rpmconf-matt: Copies the original versions of RPM-managed configuration files to *.rpmbase and invokes a three-way merge tool to resolve *.rpmsave and *.rpmnew files. (Read the security notice at the top of the file before using.) This has largely removed what for many years was a huge pain of comparing two versions of a configuration file and researching which version to keep at each difference, though there are still cases in which a scriptlet changes something in a configuration file and I have to research whether I should keep the change. The tool is named rpmconf-matt to distinguish it from the rpmconf in Fedora, which does not support three-way merges and thus provides little benefit in my view. I had hoped to get the three-way merge functionality cleaned up and added to rpmconf and thereby available in Fedora, but I haven't gotten around to it. Apparently dpkg has an analogous tool ucf --three-way, and some other package managers may as well.
rpm-overrides-matt: Same idea as rpmconf-matt for when I need to modify an RPM-managed file that is not marked as configuration and thus is just overwritten by RPM when the package is upgraded. Less mature than rpmconf-matt. On a dpkg-based system, a diversion would stop dpkg from overwriting the file, but a three-way merge tool would still be needed if the administrator needs to actually merge changes rather than just reference the original file via an include or the like.
rpm-audit: Verifies that the sets of installed and "userinstalled" (per dnf) packages match what one would get by installing all the packages and provides listed in a wants file on an empty system.

The simple system-update script brings together all three tools to do a complete system upgrade, audit, and file merge.

If you want to install some packages from an updates-testing or similar repository (or even a different Fedora release with dnf --releasever=X) without permanently enabling the whole repository, you'll need to dnf download the packages to a permanently enabled local repository in order to pass rpm-audit. See the next subsection for a consideration about doing that.

Downloading all RPMs built from a given SRPM

If you copy some RPMs from a repository you don't enable by default on your system to your own repository, it's best to copy complete sets of packages built from given SRPMs to ensure that you don't accidentally install mismatching RPMs if you forget to add another RPM built from the same SRPM to your repository before installing that package name. Unfortunately, as of this writing (2020-09-18), dnf repoquery does not have a built-in option to list all RPMs built from given SRPMs, but you can use this dnf-repoquery-by-srpm tool instead. So the command to download all packages built from given SRPMs to the current directory would be something like:

dnf download [OPTIONS] $(dnf-repoquery-by-srpm [OPTIONS] N-V-R.src.rpm [N2-V2-R2.src.rpm ...])

Note that options like --enablerepo or --releasever must be passed in two places.

Alternatively, if you want an entire Fedora update (which may include RPMs built from more than one SRPM), you can use dnf updateinfo --list, but many third-party repositories do not have an analogous update system that publishes metadata for dnf updateinfo.

RPM building against the local dnf configuration

If you want to build custom RPMs locally, I highly recommend Mock (available in Fedora in the mock package). It builds in an isolated environment, helping you achieve functional reproducibility. (Note that achieving bit-for-bit reproducibility often requires much more legwork. Also, Mock is not designed to protect your system from malicious build inputs; I recommend using a separate Qubes OS VM for that.)

However, an obstacle you're likely to encounter, at least in Fedora, is that by default, Mock builds against a standard dnf configuration for the Fedora repositories. You'd probably prefer to build against your own system's dnf repository configuration, which might include third-party repositories or even your own previously built custom RPMs. For RPM source repositories formatted like the official Fedora ones, you can achieve this with this Mock configuration file, which should be placed in /etc/mock. You can select it by passing -r host to Mock or by symlinking /etc/mock/default.cfg to host.cfg. If you are using fedpkg mockbuild, you must pass --mock-config host, since fedpkg's default choice of Mock configuration is based on the package rather than the system default.

In the past, I've used a similar Mock configuration to build custom Qubes OS RPMs for VMs rather than using the official qubes-builder, which works somewhat differently. If and when I need that configuration again and bring it up to date, I will post it here. (Building RPMs for dom0 requires a different process since your VM repository configuration won't match dom0's.)

As a reminder, in addition to using one of these configuration files, you probably want to set the following in your ~/.config/mock.cfg:

config_opts['macros']['%packager'] = 'YOUR_NAME <YOUR_EMAIL>'

Web log analysis[# Top]

Here are the tools I use to analyze the server logs for this web site to understand how people are using the site and prioritize improvements, since a quick web search didn't find another tool that did what I wanted. Notable features:

Overview file (generated/requests) that groups requests by HTTP response status code (helping to separate both site problems and obvious abuse from successful requests in order to more easily spot site problems and avoid distraction from abuse while reviewing successful requests), then by request URL (sorted by descending count), then by referrer (sorted by descending count). Among requests from a given client IP address, consecutive identical requests (e.g., successive range requests for a streaming video) are counted as one.
Log is re-grouped by client IP address (generated/logs-by-ip) so you can see the sequence of requests from one IP address to understand how the user navigated the site. (I know an IP address is an imperfect indicator of a user, but I haven't bothered to set a cookie, and my current web host doesn't give me an easy way to include cookies in the access log anyway.)
To focus on requests from real users, exclude bots (a configurable list of user agents) and a configurable list of client IP addresses that you observe to have overwhelmingly bot-like behavior.

Sample overview output (some portions omitted and comments added):

=== STATUS CODE 200 ===
  86 "GET /site/style.css"
    64 "https://mattmccutchen.net/bigint/"
     9 "https://mattmccutchen.net/"
...
  77 "GET /bigint/"
    40 "-"
    23 "https://www.google.com/"
...
   # More concise output if all requests for the same URL have the same referrer
   4 "GET /escape/icon.png" "https://mattmccutchen.net/escape/"
...
=== STATUS CODE 404 ===
...
   2 "GET //wordpress/wp-includes/wlwmanifest.xml" "-"    # abuse
...
   # Oops... site misconfiguration fixed on 2020-09-16
   1 "GET /app-downloads/escapesetup-windows-201609050.mattmccutchen202008280.exe" "https://mattmccutchen.net/escape/"
...

The obvious site-specific parameters are taken from a separate configuration file, but there are plenty of other assumptions specific to my setup that you may need to change in order to use this.

Stow[# Top]

My modified version of Stow 1.3.3 (very old) with the following enhancements:

If a target directory wasn't explicitly passed, check that it looks like you're in a stow directory to avoid trashing an unrelated parent directory. Currently checks that the basename of the stow directory is stow; should probably be updated to check for a .stow file. May be worth upstreaming.
Run scripts after stowing: update-mime-database and the like. May be worth upstreaming.
A package can contain a .dontfold file that ensures its containing directory is unfolded even if that package is the only one contributing content. Back in 2009, I converted Fedora's filesystem package to a Stow package of .dontfold files to help ensure that an accidental make install into the target directory wouldn't create files in folded directories of random packages, which would be difficult to track down and remove. However, I've never had the motivation to keep this Stow package up to date, so I'm thinking upstream's new --no-folding option may be better even if it's expensive.

`retex`[# Top]

retex is a TeX wrapper script that makes TeX compiling fit more nicely into build processes. For example, it exits nonzero immediately if an error occurs, and it repeats until a fixed point is reached in order to handle LaTeX references correctly. Better tools may exist.

`ftc`, `ftx`[# Top]

ftc packages a file tree in a single file of a simplistic format that I designed, and ftx extracts such package files. They have the same purposes as tar -c and tar -x respectively but have no bells or whistles. They handle binary files safely, but a package of only text files is itself a text file.

`gitar`[# Top]

gitar ("git archive") uses the git backend to make really small packages out of file trees with lots of redundancy. ungitar unpacks the packages; it requires ftx.

A .gitar package consists of an ftc package containing a bare git repository whose HEAD is the original tree and whose objects are all stored in a single pack. Hence, git will represent similar files in the original tree as deltas.

gitar is great for compressing together several versions of the same piece of software. I had seven versions of my custom rsync lying around, each about 585 KB as a tar-bz2 package. I unpacked them all inside a single folder and gitared that folder; the resulting gitar package was only 865 KB. Of course, if you can be bothered to import the sequence of versions into git as a proper sequence of commits, that's much better.

Note (2008-06-01): A while ago, git gained a standardized binary format for "bundles"; I should change gitar and ungitar to use bundles rather than my ftc format.

`patchsync`[# Top]

patchsync synchronizes a trunk, a branch, and a patch that contains the differences between the two. If the trunk or patch changes, it updates the branch; if the branch changes, it updates the patch. I developed patchsync to help me follow branches of rsync, but I no longer use it for that purpose. Depending on your situation, you may prefer a more sophisticated patch-management tool such as StGIT.

To set up a patchsync staging directory, run:

patchsync --new trunk patch branch where-to-create-staging

Then, to synchronize, run:

patchsync staging

Read the gigantic comment at the top of patchsync for much more information.

Version log

"Original" version (retroactively numbered 1) (2006.09.03).
Version 2 (2006.12.14): I essentially rewrote patchsync and made many enhancements. Two of them are big: patchsync can do most of the work of creating a staging directory for you, and the synchronization state consists of hash codes of the trunk, branch, and patch instead of full copies. The staging directory format has changed. To migrate: synchronize with the old patchsync, delete the staging directory, and create a new one using the new patchsync.
Version 2.1 (2006.12.15): Patchsync now erases patch-new and branch-new even if nothing changed. So if your areas are in sync but you force patchsync to update one of them on a dry run, you can clean up by running patchsync again. I forgot to update the version number in the file to 2.1.
Version 2.2 (2006.12.16): I fixed the patch-work file names in the cmp tests during copying out; that means patchsync will correctly skip copying the patch out if its data didn't actually change. I also factored the version number out of the --version message so there's one less thing I can forget to update.
Version 2.3 (2007.01.07): Three minor improvements: define cp2 to exec rsync to reduce the number of processes created, add ./ in . ./settings to avoid searching the path for programs called settings, and update some comments.
Version 2.4 (2007.07.05): Fix a serious bug in wdpp_from that broke link-dest when copying the branch out and "patchsync --new" when the staging directory wasn't inside the current directory. Strengthen the weak error checking that allowed this bug to go unnoticed for so long. Fix handling of absolute trunk, patch, and branch paths in "patchsync --new". Remove some old comments.

From 2006.12.16 to 2006.12.24, a development version of patchsync was mistakenly identified as version 2.2. As of 2006.12.24, the real version 2.2 is posted.

`xsltdepcomp`[# Top]

xsltdepcomp (named after depcomp) runs an XSL transform with xsltproc and generates a dependency makefile for the files read, thanks to xsltproc --load-trace. It is used in the build system for this web site.

`continusync`[# Top]

continusync is a perl script around inotifywait and rsync that performs continuous mirroring, as suggested by Buck Huppmann. It is currently experimental and rather inefficient, but it does appear to work in simple cases. If you want to use it, I would be much obliged if you improved it as necessary and sent me the improved version.

Isolated Firefox[# Top]

firefox-isolated is a Firefox wrapper script that creates and uses a disposable profile. You can make it harder for people to correlate your activities across multiple Web sites by browsing each site with a separate Firefox profile created by this script. This script was inspired by the Facebook Beacon outrage.

To use this script, install it in your $PATH and name your master Firefox profile (from which the disposable ones will be copied) 00000000.master; then run firefox-isolated. Your mileage may vary.

Last update 2007-12-02: Initial posting. Seems to work.

As of 2020-09-02, Firefox containers are much more convenient to use, although it's possible there are some Firefox features (perhaps some of these?) that containers and the like do not yet properly isolate but a separate profile would.

ntfsresizecopy[# Top]

ntfsresizecopy copies an NTFS filesystem from one block device to another, resizing it to the size of the destination device in the process. (It uses ntfsprogs.) This is EXPERIMENTAL; after using this script, you should mount the destination read-only and check that everything looks intact.

An expanding copy is just done with ntfsclone followed by ntfsresize. A shrinking copy is done by running ntfsclone and ntfsresize on devices specially crafted with the Linux device-mapper (requires dmsetup and losetup); you may save time by checking first that the shrinkage is possible with `ntfsresize -n -s SIZE SRC'.

The special shrinking technique should be applicable to any filesystem type that has an in-place shrinking command that doesn't write outside the new size. Just change the calls to ntfsclone and ntfsresize; ntfsclone can be replaced by a dd of the beginning of the source for filesystems that don't have a sparse clone command.

Change log

2008-06-01: Initial posting. Seems to work.

git subtree-lite[# Top]

git subtree-lite is a tool to manage modified versions of content imported from other git repositories, now deprecated in favor of Braid, which is roughly equivalent but more mature. The source repository remains available for historical interest.

Modification time of this page's main source file: 2020-09-18 19:19:44 +0000

Except where otherwise noted, Matt McCutchen waives his copyright to the content of this site. This site comes with absolutely no warranty. Why?