OCR can't read PDF scanned document

ritcharmoh · September 26, 2020, 6:20pm

Hi, I’m newbie here. I tried key word search module and enabled its OCR function. It could read text from image files but it couldn’t read text from PDF scanned document. I was wondering that is there any way to solve this issue. I really appreciate for your help

downey · September 28, 2020, 5:37pm

What is the MIME type of your document? How was it created?
OCR only processes “image” MIME types.

Alan_Browne · September 29, 2020, 8:20am

I agree with downey in that OCR will only process “image” mime types. To over come problem, a module can be developed to convert the pdf to a series of images and inserted back as derived files or another method is to extract the text in the pdf to a text file using the pdf2text python module.

ritcharmoh · October 11, 2020, 4:06pm

it was ".pdf " and it was produced by a fujitsu scanner. Since I am fraud examiner, I need a software that can read pdf scanned document. I usually use nuix software to apply keywords analysis.
Thanks for your response

Btw, is there any autopsy 3rd module that can solve this issue?

apriestman · October 11, 2020, 5:52pm

The upcoming release will be able to OCR scanned pdfs.

Topic		Replies	Views
OCR on ingest for empty PDF's Autopsy Help	2	416	February 5, 2023
Compare indexed text resulting from keyword search (incl. OCR) with allready existing OCR-layer of PDF-Files for a whole LogicalFileSetset Autopsy Help	0	47	January 28, 2025
In Autopsy: What ingest modules are nessesary for extracting/copying out all images? Autopsy Help	7	3304	March 20, 2020
File Extension Mismatch setup Autopsy Help	2	2641	October 24, 2020
Performing a File Signature Analysis Autopsy Help	1	2259	February 17, 2020

OCR can't read PDF scanned document

Related topics