Using Autopsy for recovering a corrupted filesystem

tl;dr: I want to “Extract File(s)” but completely ignore unallocated space and slack, is there an easy way that doesn’t require writing a new ingest file module?

One of my disks suddenly got around 20 megabytes of bad blocks, corrupting the contained ext4 filesystem badly so it wouldn’t mount any more, so I dumped the disk with ddrescue and have for weeks been trying to repair it with all sorts of voodoo and black magic to no avail. I finally found Autopsy, which is the only method so far that has been able to read the filesystem directory structure, and that is extremely important to me, a lot of value of what is on the disk is in the neatly organized directory structure and filenames, which I would lose with scraping tools like PhotoRec (which is my last resort).

So I’ve tested Autopsy’s right-click → “Extract File(s)” and it works pretty great! However, I do get a lot of junk among the recovered files due to Autopsy looking at slack and unallocated space. Cleaning up the -slack files is trivial, but deleted files that are extracted from unallocated space is harder to deal with since they are indistinguishable from files extracted from allocated space. For instance, I find files I believe originate from the web browser cache littered everywhere, e.g. in “random” directories like mp3 albums where those files never existed. If those files only were recovered into folders they had existed in this would have been ok, but that they are injected into “random” folders is a problem as it diminishes the value of the directory structure, which is what is most important to me.

Ideally I would like to “Extract File(s)” but completely ignore unallocated space and slack, which is not available. Or am I mistaken, and there is some option somewhere that I have missed?

I’ve looked into running an ingest on “All Files and Directories (Not Unallocated Space)”, which sounded promising, but there is no file ingest module that extracts the matched files. I tried the “Interesting Files identifier” with a rule that matches all files/dirs, but then it seems to ignore that I selected “Not Unallocated Space” for the ingest because files recovered from unallocated space are also identified as “Interesting”. Even if I got this to work and only match allocated files I don’t think it would be useful, since if I go to the “Interesting Items” listing, select all and “Extract File(s)” then the filenames are polluted with some number prefix (not so bad, I can easily undo this) and the directory structure is lost (so worthless to me). Also, I have millions of files, so “Select all” probably would just make Autopsy crash if I did it on the whole filesystem (I have only tested it on a folder with a few hundred files, most of the recovered from unallocated space).

I saw the page about File Export, which can run after “Automated Ingest” jobs, but it requires Multi-User and that I set up some server (?) which seems like a chore. And even if I went through the trouble setting that up, the File Exporter saves the files in an inconvenient way: it is renamed to some hash, and then I have to parse catalog.json and for each file see what the original path was, create those folders, copy the file there and rename it.

If I could just get a list of files that were recovered from unallocated space I could use it to delete them after a full “Extract File(s)”. I hoped that “Save table as CSV” could help, since it has columns Flags(Dir) and Flags(Meta) (what is the difference between them, btw?) that list whether the file is “Unallocated”, but it is not recursive (i.e. contents of sub-folders are not included), so I cannot get a list for the whole filesystem, just a single folder (and I have tens of thousands).

At the moment I wonder what my options are. I believe it should be possible to write a file ingest module that, for each file, checks .getType() to see if it is recovered from unallocated space (TskData.TSK_DB_FILES_TYPE_ENUM.UNALLOC_BLOCKS and maybe UNUSED_BLOCKS too?), and if so writes the file path to some report file, and then I do “Extract File(s)” and use the report file to identify which files where recovered from unallocated space and delete them. But I’d prefer something easier, and I feel my use case isn’t so weird so I kinda expect the functionality I need to already exist in some shape or form, I just cannot figure out how/where.

Any tips or ideas?