Keyword search : PST (mail with 7z attached file)

Hi all,

 consider this SCENARIO 1 : 
  • Embedded File Extraction Module
  • Keyword Search
  • PST-mail (pst file has only one mail with only one ZIP attached file)

ZIPPED

Everything works fine on Scenario N.1 :

Results - Keywors Hits  = 1 
Indexed Text is "Intelligible"

Consider this SCENARIO 2 :

  • Embedded File Extraction Module
  • Keyword Search
  • PST-mail (pst file has only one mail with only one 7Z attached file)

In Scenario N.2 :

Results - Keywors Hits  = 0 (searching the same "Scenario 1 keyword")
Indexed Text is NOT "Intelligible"

So my question :

Are attached 7z files (zipped with LZMA or BZip2 compression method) supported in Embedded File Extraction Module + Keyword Search on PST files?

Thanks in advance for your suggestions

Luca

I made two 7z files with bzip2 and lzma and they seemed to work.

For trying to figure out where the issue is, I’d suggest trying something simpler than going through the email parser then embedded file extractor and then keyword search. You can right click and extract your .7z files to disk and then make a new case adding them as a logical data source, and then you can just run the embedded file extractor and see if any files got extracted instead of running keyword search.

I’ve made a new case adding 7z files as a logical data source.
Then i’ve run the embedded file extractor and all files got correctly extracted.

Instead, processing (keyword+embedded file extractor) a Outlook-pst file having a mail with same 7z attached fails …

Thanks in advance for your support.

Any chance you can share your pst file? Send me a PM if it’s possible.

Meanwhile, I’m confused about what I’m seeing in your second screenshot. If the email parser found the attachment I believe it should show up in its compressed form as a child of the .pst file, like these gifs:

But I don’t see any children under your SEVENZIP.pst, so it doesn’t seem like there was even anything for the embedded file extractor to run on. This would suggest it’s a problem with the email parser. What did it look like in your working case? (I can’t seem to make your first screenshot larger)

Thank your for sharing your .pst files. They both work for me. I’m using Autopsy 4.14.0 on Windows 10. Here’s what I did:

Added both .pst files as a logical file set:

Ran embedded file extractor, email parser, and keyword search (and hash lookup to verify that the files were different):

In the tree, I can see the .7z files extracted by the email parser under each of the .pst files. If I click on them, I can then look at the pdf extracted by the embedded file extractor module. I can see “zanzara” in the indexed text in both, and doing a keyword search for it does work.

Can you try again doing that exact procedure? If it doesn’t work, see if there’s anything in the log (go to Help->Open log folder to find the logs)

Hi Ann,

 I'm using ONLY embedded file extractor + keyword search (both flagged in configure ingest module window). No e-mail parser.

Suppose for a moment that email-parser ingestion doesn’t exist.

The combo “embedded file extractor and keyword search” work always perfectly and always intercept all my keywords except with 7z file attached in mails…

Do you “replicate” using only embedded file extractor + keyword search ingestion process ?

Luca

I wouldn’t expect it to work without email parser. The email parser pulls out the 7zip file - Autopsy wouldn’t know about it otherwise. Then the 7zip file is decompressed by the embedded file extractor. Without email parser you can still run keyword search on the original pst file but it’s just going to see the compressed data so you probably won’t see anything.

Is there some reason you don’t want to run email parser?

Note that the embedded file extractor only runs on archives and documents, not .pst files. So it’s not going to extract the archives from the .pst, but it will extract files from the archive attachment extracted from the email parser module.

http://sleuthkit.org/autopsy/docs/user-docs/4.15.0/embedded_file_extractor_page.html

If file are zipped with zip extension it works … and keyword is “intercepted”

Check first video.mp4 part.

Now you can find also in Dropbox :

https://www.dropbox.com/sh/dxor8wz8owa3dqv/AAC5FRXyZRzw9jHVblhdCc8sa?dl=0

a file named ZIPPED_ANN.pst : mail is the same, file zipped is the same pdf file, but pdf file is “zip compressed” NOT “7z compressed”.

Now Scenario A : create a case with only ZIPPED_ANN.pst and run embedded file extractor, email parser, and keyword search.

Now SCENARIO B : create a case with only SEVENZIP_LZMA.pst and run embedded file extractor, email parser.

Autopsy behaviour is different , keyword hits are different, and so on …

Can you explain me why ?

If I run only keyword search on ZIPPED_ANN.pst, I do indeed see nice indexed text. So it has nothing to do with our embedded file extractor. My guess is that Solr/Tika (which is what we use for keyword search indexing) can do some basic parsing of .pst files and decompression, but probably doesn’t support LZMA.

Thanks Ann.

My feeling is that there is some overlap between the parsing and decompression of Solr / Tika (used by keyword search indexing) and Autopsy e-mail-parser ingest process.

This overlap can undeniably lead to some confusion and double result indexing …

I really appreciate your suggestion/support

Best regards.
Luca