Yesterday I was cleaning up carbon (a database server) and was attempting to remove files from a mounted raid device that had a clone of the live file system. Cleaning involves removing unused/unneeded directories & files.
Well the location I’m cleaning has a copy of
/ so naturally there are directories and files in there like:
/usr, /etc/, /var and others.
The cleaning involves my favourite command
rm -fr – which can be dangerous.
Such as the following sequence:
# cd /mnt/raid/root-copy/ # rm -fr ./etc # rm -fr ./usr # rm -fr /var
Shit! Forgot the
. prefix on that last command, I hit
CTRL+C as fast as I could! Time to see what I killed!
The most important thing was my database in
So I was very angry at myself when this happend:
# stat /var/lib/postgresql stat: cannot stat `/var/lib/postgresql': No such file or directory
shit shit shit
There went my database! All my client records, histories, book-keeping & accounting back to 2004.
My last conscious backup was from the first of the month – 15 days ago.
Fortunately I had an automated backup script running that had yesterdays data. But what about all the work I did today? I hate repeating myself.
Well, here’s a cool trick. On UNIX style OSes the files are not actually deleted when you say
rm. They are simply marked as deleted, only in the inode structure of the file-system. Once all processes close that file then the inode structure (with reference count zero) says’s that space on the disc is free and can then be used.
So I had delete my
/var/lib/postgresql directory but guess what! PostgreSQL had 100s of open files in there! Hooray! And the postgres process still had access to all those files in that directory!
So, I looked at the process and it’s open files (
/proc/[pid]/fd/) and could see a lot of open files. Hoping for the best I re-ran my postgres backup script. It had full access to all data, in all databases (14) and dumped out my 2.5GiB worth of book-keeping and company financial records. There is an interesting document on the proc file-descriptors and undeleting from finalcog .
When I stopped the PostgreSQL server it was dead, wouldn’t restart and had to have it’s whole data area re-initialized. I was able to recover the system right back to the point (4am) when I had issued the errant
rm -fr command.
For the record I know that
rm -fr is dangerous. I frequently find myself telling others to be careful with it. Remember
rm -fr is short hand for
rm --fuck-it --really
So yea I had backups, but they were more than 12 hours old, who wants that. Thanks to open file-descriptors I was very very lucky and able to recover the data to point-in-time of failure.
BACKUP YOUR DATA!!!