Analysis not progressing, hung on processing an .htm file

I’m using version 4.17.0 (and have also used 4.16.0 with the same results) to process an E01 image and it’s been running for about 15 hours and has been hung at 10% for most of that time. It appears to be stuck on the file HtmlReceipt.htm. This file appears to be some type of an e-Delivery receipt verification from rpost.com. I noticed there are 655 of them in the directory and they vary in size from 15KB to 36MB. Any suggestions as to how to determine what the issue is, or how to get Autopsy to skip these and continue?

Hi Kevin,

Are are you able to share your case log file? It is in the case folder/Log and will be the file named autopsy.log.0.

Kind regards.

Mark

@kgeorge and @Mark_McKinnon

In addition to looking at the Autopsy logs, here are a few more things that could help pinpoint what is going on if the ingest is still in progress:

  • In the Help menu, there is a Get Ingest Progress Snapshot menu item that will invoke a dialog that shows what module is running in each ingest job thread, how long that module has been running, and what file it is running on. At this moment, we know about the file and the time, but we don’t know which module is hung up.
  • Also in the Help menu, new in Autopsy 4.17.0, is a Thread Dump menu item. This will reveal what every thread in the application is doing, which may or may not be helpful here.

Hi Mark,

Here is the contents of the autopsy.log.0 file, I restarted processing this morning and it’s been running all day.

2020-11-12 08:17:02.644 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: Visual C Runtime libraries loaded
2020-11-12 08:17:02.647 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: ZLIB library loaded loaded
2020-11-12 08:17:02.651 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: EWF library loaded
2020-11-12 08:17:02.655 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: VMDK library loaded
2020-11-12 08:17:02.659 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: VHDI library loaded
2020-11-12 08:17:02.666 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: Crypto library loaded
2020-11-12 08:17:02.672 org.sleuthkit.autopsy.core.Installer loadDynLibraries
INFO: OpenSSL library loaded
2020-11-12 08:17:02.672 org.sleuthkit.autopsy.core.Installer
INFO: core installer created
2020-11-12 08:17:03.934 org.sleuthkit.autopsy.core.Installer validate
INFO: validate()
2020-11-12 08:17:03.935 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.coreutils.Installer validate()
2020-11-12 08:17:03.935 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.corecomponents.Installer validate()
2020-11-12 08:17:03.935 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.datamodel.Installer validate()
2020-11-12 08:17:03.992 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.ingest.Installer validate()
2020-11-12 08:17:03.992 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.centralrepository.eventlisteners.Installer validate()
2020-11-12 08:17:03.992 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.healthmonitor.Installer validate()
2020-11-12 08:17:03.992 org.sleuthkit.autopsy.core.Installer validate
INFO: org.sleuthkit.autopsy.casemodule.Installer validate()
2020-11-12 08:17:04.364 org.sleuthkit.autopsy.timeline.TimeLineModule onStart
INFO: Setting up TimeLine listeners
2020-11-12 08:17:04.487 org.sleuthkit.autopsy.core.Installer initializeSevenZip
INFO: 7zip-java bindings loaded
2020-11-12 08:17:04.487 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Default charset: windows-1252
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Default file encoding: Cp1252
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Java runtime version: 1.8.0_222-1-ojdkbuild-b10
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Netbeans Platform build: 11.3-6b879cb782eaa4f13a731aff82eada11289a66f7
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Application name: Autopsy, version: 4.17.0, build: RELEASE
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: os.name: Windows 10
2020-11-12 08:17:04.488 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: os.arch: amd64
2020-11-12 08:17:04.501 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: PID: 320
2020-11-12 08:17:04.545 org.sleuthkit.autopsy.coreutils.Installer restored
INFO: Process Virtual Memory Used: 2647175168
2020-11-12 08:17:04.545 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.coreutils.Installer restore succeeded
2020-11-12 08:17:04.546 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.corecomponents.Installer restore succeeded
2020-11-12 08:17:04.546 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.datamodel.Installer restore succeeded
2020-11-12 08:17:04.547 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.ingest.Installer restore succeeded
2020-11-12 08:17:04.548 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.centralrepository.eventlisteners.Installer restore succeeded
2020-11-12 08:17:04.558 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.healthmonitor.Installer restore succeeded
2020-11-12 08:17:04.559 org.sleuthkit.autopsy.core.Installer restored
INFO: org.sleuthkit.autopsy.casemodule.Installer restore succeeded
2020-11-12 08:17:04.559 org.sleuthkit.autopsy.core.Installer restored
INFO: Autopsy Core restore completed
2020-11-12 08:17:04.563 org.sleuthkit.autopsy.keywordsearch.KeywordSearchSettings setDefaults
INFO: Detecting default settings.
2020-11-12 08:17:04.885 org.sleuthkit.autopsy.keywordsearch.Server
INFO: Created Server instance using Java at C:\Program Files\Autopsy-4.17.0\jre\bin\java
2020-11-12 08:17:04.887 org.sleuthkit.autopsy.keywordsearch.Server start
INFO: Starting Solr server from: C:\Program Files\Autopsy-4.17.0\autopsy\solr
2020-11-12 08:17:04.887 org.sleuthkit.autopsy.keywordsearch.Server start
INFO: Port [23,232] available, starting Solr
2020-11-12 08:17:04.89 org.sleuthkit.autopsy.keywordsearch.Server runSolrCommand
INFO: Running Solr command: [C:\Program Files\Autopsy-4.17.0\jre\bin\java, -Xmx512m, -DSTOP.PORT=34343, -Djetty.port=23232, -DSTOP.KEY=jjk#09s, -jar, start.jar, -Dbootstrap_confdir=…/solr/configsets/AutopsyConfig/conf, -Dcollection.configName=AutopsyConfig]
2020-11-12 08:17:04.961 org.sleuthkit.autopsy.keywordsearch.Server runSolrCommand
INFO: Finished running Solr command
2020-11-12 08:17:15.1 org.sleuthkit.autopsy.keywordsearch.Server isRunning
INFO: Solr server is running
2020-11-12 08:17:15.296 org.sleuthkit.autopsy.keywordsearch.Server start
INFO: New Solr process PID: [17856]
2020-11-12 08:17:16.342 org.sleuthkit.autopsy.casemodule.StartupWindowProvider init
INFO: Will use the default startup window: org.sleuthkit.autopsy.casemodule.StartupWindow[dialog0,0,0,477x321,invalid,hidden,layout=java.awt.BorderLayout,APPLICATION_MODAL,title=Welcome,defaultCloseOperation=HIDE_ON_CLOSE,rootPane=javax.swing.JRootPane[,8,31,461x282,layout=javax.swing.JRootPane$RootLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=16777673,maximumSize=,minimumSize=,preferredSize=],rootPaneCheckingEnabled=true]
2020-11-12 08:17:16.452 org.sleuthkit.autopsy.corecomponents.DataContentViewerHex
INFO: Created HexView instance: org.sleuthkit.autopsy.corecomponents.DataContentViewerHex[,0,0,0x0,invalid,layout=javax.swing.GroupLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=9,maximumSize=,minimumSize=,preferredSize=java.awt.Dimension[width=100,height=58]]
2020-11-12 08:17:16.603 org.sleuthkit.autopsy.coreutils.StringExtract$StringExtractUnicodeTable init
INFO: initialized, unicode table loaded
2020-11-12 08:17:17.333 org.sleuthkit.autopsy.contentviewers.MediaFileViewer
INFO: Created MediaView instance: org.sleuthkit.autopsy.contentviewers.MediaFileViewer[,0,0,0x0,invalid,layout=java.awt.CardLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=9,maximumSize=,minimumSize=,preferredSize=]
2020-11-12 08:17:17.357 org.sleuthkit.autopsy.contentviewers.FileViewer
INFO: Created ApplicationContentViewer instance: org.sleuthkit.autopsy.contentviewers.FileViewer[,0,0,0x0,invalid,layout=javax.swing.OverlayLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=9,maximumSize=,minimumSize=,preferredSize=]
2020-11-12 08:17:18.405 org.sleuthkit.autopsy.texttranslation.translators.GoogleTranslator loadTranslator
INFO: No credentials file has been provided for Google Translator
2020-11-12 08:17:18.405 org.sleuthkit.autopsy.texttranslation.translators.GoogleTranslator loadTranslator
WARNING: Credentials were not successfully made, no translations will be available from the GoogleTranslator
2020-11-12 08:17:18.585 org.sleuthkit.autopsy.corecomponents.DataContentTopComponent
INFO: Created DataContentTopComponent instance: org.sleuthkit.autopsy.corecomponents.DataContentTopComponent[Data Content,0,0,0x0,invalid,layout=javax.swing.BoxLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=256,maximumSize=,minimumSize=,preferredSize=]
2020-11-12 08:17:45.578 org.sleuthkit.autopsy.casemodule.Case openAsCurrentCase
INFO: Opening XXXX in D:\Forensics\Cases\XXXX as the current case

Hi Richard,

According to the Ingest Snapshot under Thread ID it looks like it’s performing two instances of a Keyword Search on the file HtmlReceipt.htm (Thread ID 2 and 3). One has been processing for 3:05:45 and the other for 1:01:24 (entire analysis is hung again at 10% and was started around 8:30 AM on 11/12/20). Under Job ID/Data Source IMAGE1.E01 Files/Sec is roughly .986. It appears that it is taking a long time to perform a keyword search on these files.

Thread Dump shows the following for ID=2 (also similar for ID=3). Not sure what this means yet, but it does give me a starting point to research. Thanks.

“Reference Handler” Id=2 WAITING on java.lang.ref.Reference$Lock@e2cf751
at java.lang.Object.wait(Native Method)
- waiting on java.lang.ref.Reference$Lock@e2cf751
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

Hi @kgeorge, note that those should be two files with the same name. Each file gets passed down the modules pipeline once in an ingest job. Unfortunately, the ingest snapshot does not indicate whether the files are being indexed for keyword search or being searched.

That particular snippet of the thread dump does not seem significant - at any given time, many threads will be waiting for some lock.

I know this is seldom possible, but if you were able to share the complete thread dump, I would be happy to look at it. If you could extract and share the problem files, someone here at Basis Technology, either myself or someone on my team, could step through the processing of the files in a debugger to see if we can reproduce the problem. Like I said, I know that data usually cannot be shared, but it doesn’t hurt to ask since remote debugging is so difficult.

Hi @Richard_Cordovano, if you can provide me with an email address I can send you a zip file containing the .htm files and the ThreadDump file. Thanks for your help!

@Mark_McKinnon and @Richard_Cordovano,

I just wanted to provide an update on this. To experiment with trying to determine what was going on I restarted processing the image and ran one Ingest module at a time and each one completed as it should. Keyword Search took the longest time, slightly longer than two days. So what I suspect was happening was in running all Ingest modules at the same time (I had 15 of them selected) Keyword Search was taking the longest time due to the number of files in the image and made it appear that the entire image was not processing, although other Ingest modules were still running.

It also helped to get a better understanding that things were still running by using the Ingest Progress Snapshot as well as the Ingest Jobs under the Summary tab to watch the progression.

@kgeorge, you can send the zip to rcordovano@basistech.com. Thanks!

@kgeorge, a Basis engineer has reproduced the hang you experienced while running in a debugger. The hang is occurring when we call into a third party language detector. We added the call fairly recently, in connection with support for Japanese language keyword search. The engineer says:

Looks like internally they are doing some kind of regex matcher “replaceAll” operations, and those are getting stuck and using up all of the CPU. I’m not sure what we can do about this. I have also verified that if I comment out the call to languageDetector.detect(), the ingest on the data source completes in 13 seconds.

@carrier and I will need to talk about options.

@Richard_Cordovano, thanks for the update. Keep me informed, I’m interested to know how it gets resolved.

@kgeorge, I have an update for you.

First, just FYI, the indexing of the files eventually completes. It just takes a long time. The time is spent trying to do language detection using regular expressions and other techniques on the large block of base 64 encoded data at the end of the files.

The files are not recognized as HTML files by Tika (Perhaps because of the base 64 data? Not sure yet), so we fall back to “strings extraction” instead of Tika text extraction. Since strings extraction pulls out simple ASCII text, it makes no sense to do language detection on the result. So we have disabled language detection for strings extraction. The files are then processed in 13 seconds. This fix will be in the next release, currently scheduled for January 2021, i.e., soon.

Longer term, we have created a “story” in our internal Jira instance about handling base 64 encoded data during text extraction. This story is not currently allocated to a release. We will be doing planning for the March 2021 release soon, but I won’t know whether that additional “story” will be in the release until we prioritize work for the release. We have a small team and lots of stories!

1 Like

P.S.:
@kgeorge, thanks for your help with the reporting and diagnosis of this issue.

1 Like

@Richard_Cordovano, you’re welcome, and glad you were able to figure out what was going on. Looking forward to the next release.