I am seeing periodic slow downs, but have set the ingest not to have any periodic searches, I am not sure if these errors in the logs suggest it is trying to run a search?
@mckw99 Short answer - the periodic searches are disabled. That “Resetting periodic search time out to default value” log message is a bit misleading, we should change it. I have looked at the code and what it means is that we are setting the “user specified default value”. Unless you must run KWS before the ingest completes, you should definitely disable the periodic searches. They start to take more and more time as the size of index grows.
Where is the Tika log is located for further investigation?
I wouldn’t worry about Tika errors. Tika is a text extraction tool that we use to extract text out of files. Sometimes it is unable to do so. But to answer your question, the Tika log (and other logs, including Solr) are located in “C:\Users\USER_NAME\AppData\Roaming\autopsy\var\log” directory
If it goes to plan that will be a 3.3tb data set that has taken 12 days (guess/estimated at the moment) to complete with the memory settings 16GB, JVM:10GB and Solr JVM of 4GB. Do you think this is a reasonable speed or slow in your experience?
Ugh, there really isn’t such a thing as “expected ingest speed”. Everything very very heavily depends on what kind of data is being ingested, and on what kind of system. I honestly haven’t tried to ingest 3+ TBs into a single user case to give you an educated answer as to whether the performance you’re seeing is slow or reasonable. Overall, with only 16GB of RAM and only 4GB allocated to Solr, I think that’s the biggest bottleneck.
I have another data set of 6tb to search, I am not sure I am willing to wait 24 days for it to complete - I may have to look at a clustered Solr system for the larger dataset…
I would not wait 24 days either, nor would I expect that you can linearly extrapolate from 12 days to 24 days. At least with a single Solr server, the indexing speed definitely slows down greatly as the size of index increases. So I would definitely recommend creating an Autopsy multi-user cluster. Even if you have a single Solr node, you may get performance gain simply because Solr will be on a dedicated machine (as opposed to fighting for CPU, RAM, and disk access with Autopsy on your local machine) and running with much greater hardware resources than 4GB of RAM. If you find hardware resources to have several Solr servers - that will make a huge difference! You can also run ingest on several machines in parallel (if you have multiple data sources) which will obviously also increase the ingest speed. For a single user case 6TB is a significant chunk of data though, so if you are finding that on your system the ingest is seriously slowing down at some point, then you may want to consider splitting the input data sources into several Autopsy cases (e.g. 3TB in case1 and another 3TBs into case2). You’ll have to examine the cases separately which is obviously inconvenient, but the ingest will complete much faster.