Large Ingestion Performance

autopsy_user · June 2, 2022, 1:13pm

Hi,

We have a relatively large forensic project and wanted to ask for advice about Autopsy case size and performance.

Our goal is to analyze the contents of a large file server (logical files) to identify instances of approx. 20 keywords and 1 regex. We’re using embedded file extraction, email parsing, and keyword searching.

The contents are ~3.5TB in >1M files. 1000’s of the files are MS Outlook PST files, some up to ~50GB, and many containing 10,000’s or 100,000’s of email messages.

Ideally we’d like to complete ingestion and keywords searching within a few weeks.

A couple questions about Autopsy’s performance/limits:

Is there a practical limit to Autopsy case size in order for it to likely complete ingestion within a few weeks?
If Autopsy is interrupted during ingestion (e.g. out of disk space, system reboot, etc.), is there a way to resume ingestion at the point where it stopped rather than starting over?
Is there a way to get a list of the file names that have been ingested so far while it’s running?
Any other ideas or advise about a project like this?

Thanks!

fancy_flare · June 7, 2022, 12:16am

I’m just posting to follow as I’d be interested in this as well.

lfcnassif · June 9, 2022, 5:09am

I don’t know about Autopsy limits, but if you would like to try IPED open source - GNU GPL v3 - forensic tool (GitHub - sepinf-inc/IPED: IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.) here are the answers regarding it:

2^31 ~ 2 billion files in a case, although the main UI table tab can list up to ~135mi files at once (probably you won’t hit that limit when doing keyword searches)
You can use the --continue cmd line option
Just open the UI, apply the actual files filter and export file properties
Check Performance Tips · sepinf-inc/IPED Wiki · GitHub

Hope it could help handling your huge case.

nika · June 9, 2022, 4:26pm

The only tip I could suggest, based off of previous experience, is to ingest with one or two modules at a time. Don’t run them all at once and restart autopsy in-between. But this also depends on your system.

EDIT:
After thinking through this again, I wanted to add a bit more data:

I regularly ingest large data sets, both images and logical files. I am usually in single user mode. It seems to be a good practice, in the case of large data sets, to add the data source without running any ingest modules or with only running a few.

I recently ingested a large amount of MBOX (Thunderbird) files. I made the mistake of running too many modules and also started viewing the case while it was mid ingesting. This caused Autopsy to freeze up. I had to restart the ingest.

So, in summary, a large dataset like that could take over 24 hours, depending on the system, the drive speed(s), amount of RAM, processing power, etc. But if analyzed in a “gradient approach” it makes it a bit easier.

I hope I didn’t write too much and that this helps.

jgalbraith · June 9, 2022, 4:52pm

For one of my cases, I ingested a 1 TB image and if I remember correctly it took roughly 24 hours. I was using a multi user setup and was also not using every analyzer for ingest. While it ingest was running, I had a secondary host that also had autopsy on it to view the case with during ingest for preliminary analysis.

Topic		Replies	Views
Performance testing and tuning for Autopsy 4.18 Autopsy Help	32	6909	July 29, 2022
28 GB log file in case directory from ingestion of a 300 GB HDD Autopsy Help	1	291	April 3, 2023
Autopsy closes after ingestion Autopsy Help	0	307	August 26, 2021
Non-interactive Automated Autopsy / Cloud Scaling Autopsy Help	1	323	June 9, 2022
Autopsy 4.14.0 Ingestion always blocked at 86% Autopsy Help	12	2337	March 10, 2020

Large Ingestion Performance

Related topics