Deleted Database Recovery – Open File Descriptors Save Lives

Yesterday I was cleaning up carbon (a database server) and was attempting to remove files from a mounted raid device that had a clone of the live file system. Cleaning involves removing unused/unneeded directories & files.

Well the location I’m cleaning has a copy of / so naturally there are directories and files in there like: /usr, /etc/, /var and others.

The cleaning involves my favourite command rm -fr – which can be dangerous.

Such as the following sequence:

# cd /mnt/raid/root-copy/
# rm -fr ./etc
# rm -fr ./usr
# rm -fr /var

Shit! Forgot the . prefix on that last command, I hit CTRL+C as fast as I could! Time to see what I killed!

The most important thing was my database in /var/lib/postgresql.

So I was very angry at myself when this happend:

# stat /var/lib/postgresql
stat: cannot stat `/var/lib/postgresql': No such file or directory

shit shit shit

There went my database! All my client records, histories, book-keeping & accounting back to 2004.

My last conscious backup was from the first of the month – 15 days ago.

Fortunately I had an automated backup script running that had yesterdays data. But what about all the work I did today? I hate repeating myself.

Well, here’s a cool trick. On UNIX style OSes the files are not actually deleted when you say rm. They are simply marked as deleted, only in the inode structure of the file-system. Once all processes close that file then the inode structure (with reference count zero) says’s that space on the disc is free and can then be used.

So I had delete my /var/lib/postgresql directory but guess what! PostgreSQL had 100s of open files in there! Hooray! And the postgres process still had access to all those files in that directory!

So, I looked at the process and it’s open files (/proc/[pid]/fd/) and could see a lot of open files. Hoping for the best I re-ran my postgres backup script. It had full access to all data, in all databases (14) and dumped out my 2.5GiB worth of book-keeping and company financial records. There is an interesting document on the proc file-descriptors and undeleting from finalcog .

When I stopped the PostgreSQL server it was dead, wouldn’t restart and had to have it’s whole data area re-initialized. I was able to recover the system right back to the point (4am) when I had issued the errant rm -fr command.

For the record I know that rm -fr is dangerous. I frequently find myself telling others to be careful with it. Remember rm -fr is short hand for rm --fuck-it --really

So yea I had backups, but they were more than 12 hours old, who wants that. Thanks to open file-descriptors I was very very lucky and able to recover the data to point-in-time of failure.

BACKUP YOUR DATA!!!

http://blog.edoceo.com/