Sunday, July 31, 2022

2 Years of My Life Truncated, Then Recovered

I had started recording data about my daily activities (workouts, etc) a few years ago. At first, I didn't think of it as anything serious but as time passed by, I had accumulated over two years of data/notes in this simple text file.

Disaster struck a week ago on an early Sunday morning.

I had just woken up and I was saving my notes file, but I was low on disk space so the write was not completing and the computer seemed to be hung. Wanting to get rid of the dreaded spinning beachball, I powered off and rebooted. After the reboot, I found that the file was truncated to zero bytes. In disbelief, I scrolled up and down. Nothing. All content was lost.

By then, I was fully awake.

I back up my drives infrequently but in my defense, the most important stuff on my computers happens to be code. All my code is saved not just on my local machines, but to Github and also to external servers, so additional backups are not super critical.

When I found out that the backup copies that I had made for this particular file were missing seven months worth of data, my heart sank. I did not have a recent backup copy. Two words, "Time Machine" stung.

I had to get this file back.

I stopped using the computer so that any recoverable data would not be overwritten with new files. On another computer, I downloaded a tool called "Disk Drill" to a USB drive and then ran it from USB on the machine with the lost file.

Disk Drill says that the free version will show you what can be recovered, then you pay something north of $100 to get the version that will recover the data.

As it turned out that wasn't true. Disk drill reported that it could recover some files, but when I went to view what would be recovered, it said I had to pay for the full version in order to do that.

I decided to take another approach.

When a file is deleted (or truncated in my case), the data that was in that file still exists on the disk. The operating system's filesystem doesn't view that data as part of any file. Instead, it views those sectors of the disk as unallocated and fair game for writing new data to. But if you can avoid writing over them, and have a way to access the raw data from the disk, then you can still recover some or all of the data.

So as a next step, I booted the machine into recovery mode so that there would be minimal activity writing to the disk. I then ran the following command:

cat /dev/disk0s2 | perl -pe 's/[^[A-Za-z0-9\n\r \/,#i():.?*]//g' | grep -A15000 "DO NOT FORGET" >> /Volumes/THUMB/recover2.txt

The cat command reads the entire disk device content and sends it to stdout. I piped that into a small perl script that filters out non-printable characters.

I piped that output into a grep command that looks for some text I knew was at the very beginning of the lost file. I executed grep with the -A15000 option to tell it to print the next 15000 lines after the matching text, which I knew was long enough to cover the length of my lost file.

Finally, I directed the output of grep to be stored on a USB thumb drive, since I didn't want any writes to the main disk.

This ran for more than 24 hours, but in the end it recovered several versions of my file, each missing a small amount of data at the end. In the most complete version of the file, I was missing only 4 days worth of data (the entire file covers more than two year's worth of entries).

Image: My own

Post a Comment

Note: Only a member of this blog may post a comment.