Yesterday I was cleaning up carbon (a database server) and was attempting to remove files from a mounted raid device that had a clone of the live file system. Cleaning involves removing unused/unneeded directories & files.
Well the location I’m cleaning has a copy of /
so naturally there are directories and files in there like: /usr, /etc/, /var
and others.
The cleaning involves my favourite command rm -fr
– which can be dangerous.
Such as the following sequence:
# cd /mnt/raid/root-copy/ # rm -fr ./etc # rm -fr ./usr # rm -fr /var
Shit! Forgot the .
prefix on that last command, I hit CTRL+C
as fast as I could! Time to see what I killed!
The most important thing was my database in /var/lib/postgresql
.
So I was very angry at myself when this happend:
# stat /var/lib/postgresql stat: cannot stat `/var/lib/postgresql': No such file or directory
shit shit shit
There went my database! All my client records, histories, book-keeping & accounting back to 2004.
My last conscious backup was from the first of the month – 15 days ago.
Fortunately I had an automated backup script running that had yesterdays data. But what about all the work I did today? I hate repeating myself.
Well, here’s a cool trick. On UNIX style OSes the files are not actually deleted when you say rm
. They are simply marked as deleted, only in the inode structure of the file-system. Once all processes close that file then the inode structure (with reference count zero) says’s that space on the disc is free and can then be used.
So I had delete my /var/lib/postgresql
directory but guess what! PostgreSQL had 100s of open files in there! Hooray! And the postgres process still had access to all those files in that directory!
So, I looked at the process and it’s open files (/proc/[pid]/fd/
) and could see a lot of open files. Hoping for the best I re-ran my postgres backup script. It had full access to all data, in all databases (14) and dumped out my 2.5GiB worth of book-keeping and company financial records. There is an interesting document on the proc file-descriptors and undeleting from finalcog .
When I stopped the PostgreSQL server it was dead, wouldn’t restart and had to have it’s whole data area re-initialized. I was able to recover the system right back to the point (4am) when I had issued the errant rm -fr
command.
For the record I know that rm -fr
is dangerous. I frequently find myself telling others to be careful with it. Remember rm -fr
is short hand for rm --fuck-it --really
So yea I had backups, but they were more than 12 hours old, who wants that. Thanks to open file-descriptors I was very very lucky and able to recover the data to point-in-time of failure.
BACKUP YOUR DATA!!!