Performance testing and tuning for Autopsy 4.18

Eugene_Livis · April 5, 2021, 8:26pm

@honor_the_data , @mckw99 @Athomas Sorry for delayed response.

While working on integrating Solr8 into Autopsy 4.18 I have run some profiling tests. Short answer - if you are going to work with large images (TBs) and KWS performance is important, your best best is to use a network Solr server. We call this “Multi-User” (MU) mode, as opposed to “Single user” mode which is the default Autopsy mode. The instructions on how to install a MU Solr server are located here:

https://sleuthkit.org/autopsy/docs/user-docs/4.18.0/install_solr_page.html

The instructions are extensive but the process is honestly very easy.

Some notes:

I find that a single Solr server works well up to 1TB, then the performance starts to slow down. The performance doesn’t “drop off the cliff” but it keeps slowing down as you add more data.
A single MU Solr server will probably not perform any better than a SU Autopsy case. However, in MU mode you can add additional Solr servers and create a Solr cluster. See " Adding More Solr Nodes" in the above documentation. That is where performance gains come from, especially for large input data sources. Apache Solr documentation calls this “SolrCloud” mode and each Solr server is called a “shard”. The more Solr servers/shards you have, the better performance you will have for large data sets. On our test and production clusters, we are using 4-6 Solr servers to handle data sets of up to 10TB. That seems to be the upper limit. After that, you are much better off breaking your Autopsy case into multiple cases, thus creating a separate Solr index for each case.
In my experience, a 3-node SolrCloud indexes data roughly twice as fast as single Solr node. A 6-node SolrCloud indexes data almost twice as fast as 3-node SolrCloud. After that I did not see much performance gain. This is all very rough figures that are heavily dependent on network throughput, machine resources, disk access speeds, and the type of data that is being indexed.
Exact match searches are MUCH faster than substring or regex searches.
Regex searches tend to use a lot of RAM on the Solr server.
I find that indexing/searching of unallocated space really slows everything down because it is mostly binary or garbal data.
If you are not going to look at the search results until ingest is over then you should disable the periodic searches. They will start taking longer as your input data grows.

Hope this helps. I’ll be glad to answer any other questions.

Topic		Replies	Views
Performance Issues (Ingest Modules) Autopsy Help	6	2111	March 31, 2021
Autopsy 4.19.1 Windows 10 Performance Issue Autopsy Help	5	2037	August 27, 2021
New User - Autopsy 4.15.0 - Analyzing is extremely slow Autopsy Help	3	2695	July 3, 2021
Autopsy performance issues Online Training	0	738	April 26, 2020
Autopsy Disk Read metrics are a fraction of other forensics/non-forensics tools Autopsy Help	2	616	December 22, 2020

Performance testing and tuning for Autopsy 4.18

Related topics