SLOW Ingest with thread dump..."tika-reader-0" Id=4456649 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer

Hi there, I am running a keyword search on a large dataset - 3Tb - although only a subset of file types within that data are searched. It started off great - storming through the files, however it has now slowed down considerably with the following example a thread dump (Dropbox - Ingest Thread Dump.txt - Simplify your life) . I dont know that much about Java - anyone shed any light on what is going on and more importantly can I rectify it while the ingest is running? It has been running for 4 days now and I really dont want to be forced into starting the ingest again… The machine it is running on has 8 cores (4 threads running), 16GB RAM and 800Gb of disk space (this case is so far 90Gb with 31% completed). TIA

We are taking a look at this at Basis Technology. Thanks for the thread dump!

@mckw99, unfortunately, you have run afoul of a previously unknown deadlock bug. Your ingest job will never complete. Sorry about that.

We have a fix for the problem and are looking at releasing it quickly.

Richard, I really appreciate you looking into this for me and letting me know what the issue is. I have an ever larger data set to run this keyword search on (6tb+) after this one. I will keep an eye on the website for an update but would really appreciate if you could reply on this thread when the fix is released?
Again thanks, appreciate the help.

@mckw99, sure thing, I am building an Autopsy 4.18.0 installer now, which I will hand off to Brian Carrier for release after my team and/or I do a final smoke test. I expect we will be able to release early next week.

For what it’s worth, the deadlock is a timing thing and will not happen every time. You might be just fine trying your next data set, if you can’t afford to wait.

For future reference, if you look at the thread dump you captured, you will find that the deadlock is quite literally reported there. If you ever do a thread dump and it tells you there is a deadlock, whatever processes have experienced the deadlock are stalled “forever” and an application restart is your only recourse. And of course, sharing the thread dump with us allows us to eliminate the deadlock potential.

@mckw99, Autopsy 4.18.0, which includes the fix for the deadlock you experienced, has been released. Thanks again for sharing your heap dump, which allowed us to quickly pinpoint the problem and apply the patch.

Thanks Richard! I’ve installed it and it seems to be working so far (20%) through 3.3Tb and been running for approximately 30 hours, maybe slightly longer (forgot to note when I kicked it off!). This time I have excluded periodic keyword searches - to try to speed up the ingest. Here’s the latest thread dump. Dropbox - Ingest Thread Dump - 2.txt - Simplify your life I am not sure if I am still seeing deadlock in the thread dump? Not sure of the specific syntax I would be looking for? I was working on the assumption it is to do with “WAITING on java.util.concurrent.locks”? Thanks for your help.

@Richard_Cordovano Just FYI it seems to have happened again. 40% through a 3.3Tb dataset and it slowed down to an unusable speed. This post wont let me link the most recent thread dump so I created a new post here with the latest thread dump link. Slow ingest dump - version 4.18 - Keyword search - Autopsy Help - Autopsy and The Sleuth Kit This is running single user, Eugene Livis suggested setting up a Solr server as a cluster in a ‘Multi-user’ configuration. I will look at this option but in the meantime I though you might like to take a look a the thread-dump.