@Eugene_Livis - thanks for asking I tweaked the memory settings without upping the ‘physical’ memory in the host VM running Autopsy (16GB RAM, Windows Server 2019 (only running Autopsy). Firstly I upped it to use JVM:10GB and Solr JVM of 8GB and all 8 cores - this killed Autopsy - hung at around 20% and I had to terminate the Autopsy/Java processes (and reboot the machine for good measure). Suspected the memory settings had killed the OS - but no expert on these things! The current ingest I am running is based on JVM:10GB and Solr JVM of 4GB. It started 2021-04-14 09:03 and is 54% through at 2021-04-20 10:00 (3.3Tb) - 6 days so far. I am seeing periodic slow downs, but have set the ingest not to have any periodic searches, I am not sure if these errors in the logs suggest it is trying to run a search? Where is the Tika log is located for further investigation?
2021-04-14 15:27:35.48 org.sleuthkit.autopsy.keywordsearch.IngestSearchRunner startJob
INFO: Resetting periodic search time out to default value
2021-04-14 15:28:24.804 org.sleuthkit.autopsy.textextractors.TikaTextExtractor getReader
WARNING: Error with file [id=269594] XXXXXXXXXX.doc, see Tika log for details…
I am seeing a few errors relating to files not existing, but I would expect that as it is a live file system:
WARNING: Error reading from file with objId = 4351355
org.sleuthkit.datamodel.TskCoreException: Error reading local file, it does not exist at local path: S:\XXXXXXXXX.docx
I would be interested in your thoughts. At the moment I was planning on letting it run (hopefully complete within another 6 days) and will run the KWS after the ingest has completed. If it goes to plan that will be a 3.3tb data set that has taken 12 days (guess/estimated at the moment) to complete with the memory settings 16GB, JVM:10GB and Solr JVM of 4GB. Do you think this is a reasonable speed or slow in your experience? The ingest snapshot suggests it is doing about 10 files per second. I have another data set of 6tb to search, I am not sure I am willing to wait 24 days for it to complete - I may have to look at a clustered Solr system for the larger dataset…