Many years ago I did not read the documentation for PostgreSQL.
To make my database "more parallel" I started up multiple database servers aimed at the same hapless directory. This does not work. What's worse, it doesn't work, but only after appearing to work for a while.
The multiple PostgreSQLs will start up and one will get the lock on the directory necessary to actually operate it as a database. The others will answer your requests and make a bunch of tiny little files for the database transaction in a tmp folder that I presume it cleans up when it gets a lock. In retrospect, I'm kind of amazed I managed to wreck something this badly, the one that got the network port must not have been the one that got the database lock.
This insanity happened for a day under reasonably heavy load before I came to our senses and undid this. The problem is that PostgreSQL tries to run all these transactions back when it reboots, and is unable to say their names.
Would not be an unusual filename for one file. And there were millions of these. The files themselves were much smaller than their unique names. Something inside the code was doing a wildcard, and trying to read in millions of unique names into memory would hit the memory max very quickly and throw the server into swap hell.
$: ls *
would load the millions of unique names into memory hit the memory max and throw the server into swap hell. Guess what
$: rm -rf *
does? That's right, it expands the wildcard and tries to load the gigs and gigs of names of files into memory until it hits swap hell, but recursively.
I know now the right answer involves find with arcane flags, but I didn't know how to do it then. Deleting it was impossible, but I could move it — mv doesn't need to know about the children. The junk folder lies until the end of time at /var/ryleh, where dead Cthulu waits dreaming.