Performance testing and tuning for Autopsy 4.18

@Eugene_Livis - thanks for asking :slight_smile: I tweaked the memory settings without upping the ‘physical’ memory in the host VM running Autopsy (16GB RAM, Windows Server 2019 (only running Autopsy). Firstly I upped it to use JVM:10GB and Solr JVM of 8GB and all 8 cores - this killed Autopsy - hung at around 20% and I had to terminate the Autopsy/Java processes (and reboot the machine for good measure). Suspected the memory settings had killed the OS - but no expert on these things! The current ingest I am running is based on JVM:10GB and Solr JVM of 4GB. It started 2021-04-14 09:03 and is 54% through at 2021-04-20 10:00 (3.3Tb) - 6 days so far. I am seeing periodic slow downs, but have set the ingest not to have any periodic searches, I am not sure if these errors in the logs suggest it is trying to run a search? Where is the Tika log is located for further investigation?

2021-04-14 15:27:35.48 org.sleuthkit.autopsy.keywordsearch.IngestSearchRunner startJob
INFO: Resetting periodic search time out to default value
2021-04-14 15:28:24.804 org.sleuthkit.autopsy.textextractors.TikaTextExtractor getReader
WARNING: Error with file [id=269594] XXXXXXXXXX.doc, see Tika log for details…

I am seeing a few errors relating to files not existing, but I would expect that as it is a live file system:
WARNING: Error reading from file with objId = 4351355
org.sleuthkit.datamodel.TskCoreException: Error reading local file, it does not exist at local path: S:\XXXXXXXXX.docx

I would be interested in your thoughts. At the moment I was planning on letting it run (hopefully complete within another 6 days) and will run the KWS after the ingest has completed. If it goes to plan that will be a 3.3tb data set that has taken 12 days (guess/estimated at the moment) to complete with the memory settings 16GB, JVM:10GB and Solr JVM of 4GB. Do you think this is a reasonable speed or slow in your experience? The ingest snapshot suggests it is doing about 10 files per second. I have another data set of 6tb to search, I am not sure I am willing to wait 24 days for it to complete - I may have to look at a clustered Solr system for the larger dataset…

I am seeing periodic slow downs, but have set the ingest not to have any periodic searches, I am not sure if these errors in the logs suggest it is trying to run a search?

@mckw99 Short answer - the periodic searches are disabled. That “Resetting periodic search time out to default value” log message is a bit misleading, we should change it. I have looked at the code and what it means is that we are setting the “user specified default value”. Unless you must run KWS before the ingest completes, you should definitely disable the periodic searches. They start to take more and more time as the size of index grows.

Where is the Tika log is located for further investigation?

I wouldn’t worry about Tika errors. Tika is a text extraction tool that we use to extract text out of files. Sometimes it is unable to do so. But to answer your question, the Tika log (and other logs, including Solr) are located in “C:\Users\USER_NAME\AppData\Roaming\autopsy\var\log” directory

If it goes to plan that will be a 3.3tb data set that has taken 12 days (guess/estimated at the moment) to complete with the memory settings 16GB, JVM:10GB and Solr JVM of 4GB. Do you think this is a reasonable speed or slow in your experience?

Ugh, there really isn’t such a thing as “expected ingest speed”. Everything very very heavily depends on what kind of data is being ingested, and on what kind of system. I honestly haven’t tried to ingest 3+ TBs into a single user case to give you an educated answer as to whether the performance you’re seeing is slow or reasonable. Overall, with only 16GB of RAM and only 4GB allocated to Solr, I think that’s the biggest bottleneck.

I have another data set of 6tb to search, I am not sure I am willing to wait 24 days for it to complete - I may have to look at a clustered Solr system for the larger dataset…

I would not wait 24 days either, nor would I expect that you can linearly extrapolate from 12 days to 24 days. At least with a single Solr server, the indexing speed definitely slows down greatly as the size of index increases. So I would definitely recommend creating an Autopsy multi-user cluster. Even if you have a single Solr node, you may get performance gain simply because Solr will be on a dedicated machine (as opposed to fighting for CPU, RAM, and disk access with Autopsy on your local machine) and running with much greater hardware resources than 4GB of RAM. If you find hardware resources to have several Solr servers - that will make a huge difference! You can also run ingest on several machines in parallel (if you have multiple data sources) which will obviously also increase the ingest speed. For a single user case 6TB is a significant chunk of data though, so if you are finding that on your system the ingest is seriously slowing down at some point, then you may want to consider splitting the input data sources into several Autopsy cases (e.g. 3TB in case1 and another 3TBs into case2). You’ll have to examine the cases separately which is obviously inconvenient, but the ingest will complete much faster.

I am seeing a few errors relating to files not existing, but I would expect that as it is a live file system

@mckw99 I just want to check if you are running Autopsy on the same drive that you are trying to analyze? That can definitely lead to some problems:

Hi @Eugene_Livis, no, dataset I am analysing is a mapped drive to a network share (so network card is always going to be a bottleneck). Autopsy is installed on the system drive c: and the logfiles/case is on another separate drive Not ideal but what I’ve got to work with. Unfortunately this afternoon I managed to accidentally log off the machine instead of lock it :woman_facepalming:. 7 days of ingest and I accidentally kill the process. I reopened the case and restarted the ingest - will it restart from where it was terminated or (as I suspect) restart the ingest from beginning? Again thanks for your advice, much appreciated.

Unfortunately this afternoon I managed to accidentally log off the machine instead of lock it :woman_facepalming:. 7 days of ingest and I accidentally kill the process.

@mckw99 ugh, sorry…

and the logfiles/case is on another separate drive

This will slow things down a bit further because the case database is located in the case directory.

I reopened the case and restarted the ingest - will it restart from where it was terminated or (as I suspect) restart the ingest from beginning?

No, unfortunately the ingest will be restarted from the beginning. So I would definitely create a new case because at it stands right now you are adding to the existing index and case database, which already contain 7 days worth of data. And you will also see duplicates for all of the previously processed data.

Ok, so it took 6.5 days to complete - but it got there! Actually quicker than I thought it would. Issue I am having now is my Keyword search bar seems to have disappeared? same no matter what case I look at? I am sure it used to be where the ? is…

Am I missing something incredibly simple?

Well that’s alarming :scream: :scream:! Please let us know if you figure out how to restore the keyword search bar.

Issue I am having now is my Keyword search bar seems to have disappeared?

@mckw99 Wow, That’s really odd, I have never seen anything like that. I assume you have restarted Autopsy and that didn’t help? The first thing I would do is try to delete your Autopsy user profile by deleting (or let’s try renaming first) the “C:\Users\elivis\AppData\Roaming\ autopsy” directory. Rename the “autopsy” directory to “autopsy_old”. This way Autopsy will start completely “clean” as basically a brand new install. All your cases and processed data will NOT be affected, but you will have to reconfigure all of the Autopsy settings though. If that doesn’t fix it, then you should try deleting the “autopsy” folder again, followed by uninstalling and reinstalling Autopsy.

@Eugene_Livis Tried all the usual suspects - rebooted, deleted autopsy profile, uninstalled autopsy, re-installed. Ended up having to delete my user profile on the machine and that sorted it :roll_eyes: As far as the 3Tb search goes - it ran much faster this time, may be because autopsy was installed on the same drive as the case files…I’ll start my 6tb search next week and see how it goes… Thanks for the help.

Ended up having to delete my user profile on the machine and that sorted it

@mckw99 That’s weird. But i’m glad that it’s fixed.

As far as the 3Tb search goes - it ran much faster this time, may be because autopsy was installed on the same drive as the case files

That will definitely greatly improve performance. It will be even better if you have SSD drives.

So I thought I would chance my arm and go with the 6tb dataset (after finally getting the 3tb one to go so well). Unfortunately its slowed right down to 5 files a second on the progress snapshot. I think this one might just be too much for the hardware and single solution I am running but I have attached a link to the thread dump fyi. Dropbox - Ingest thread dump 04.txt - Simplify your life

Just to let you know IPED version 4 enabled robust(parallel)ImageReading by default and that can make E01 processing up to 3x faster depending on your hardware and features enabled.

Best,
Luis Nassif

Please excuse me if someone already took this up. There are a lot of posts in this thread and honestly I did not read them all.

In a post from @Mark_McKinnon I learned you can change the journal mode of the built in database to wal. This speeds up ingest for sure. It seems to speed up indexing too.

Not sure if this will be helpful or not but thought I’d throw this out there just in case.