Hey Everyone,
I’m working on my first python module and I am wondering what the best way to read file content would be? I’ve seen examples of writing the file to a temporary location first but those seem to be centered around running other executable on the file afterwards. In my case I just need to get content as it’s just text data and I can process it within the module.

The module is a Data Source Ingest Module and I have no troubles so far collecting the files I need and getting any metadata that I need. I’d appreciate any advice on reading actual content. I think it’s possible based on some of the AbstractFile object documentation but I’m not sure on the best approach.


Try this and see if it works for you.

import jarray
from org.sleuthkit.datamodel import ReadContentInputStream

fileManager = Case.getCurrentCase().getServices().getFileManager()
files = fileManager.findFiles(dataSource, “%”)
numFiles = len(files)
self.log(Level.INFO, “found " + str(numFiles) + " files”)

for file in fileList:
fileContent = ReadContentInputStream(file)
fileBuffer = jarray.zeros(file.getSize(), “b”)
filebytes =

Thanks @Mark_McKinnon, that was pretty close to how I had mine based on the example ingest module. Realized that I was overthinking it and that the content would be in the fileBuffer array not the filebytes. There may be a cleaner way to convert the array to text with some of the jython / java implementation but this was the easiest way for me to do it in straight python:

rawFile = ReadContentInputStream(file)
fileBuffer = jarray.zeros(file.getSize(), "b")
filebytes =
# For whatever reason the jarray included 2 items at the start of the array.
# (-1, -2) that are not really part of the file. 
byte_array = fileBuffer[2:] 

fileContent = ""
for byte in byte_array:
    fileContent += chr(byte)
fileContent = fileContent.replace('\x00', '')
##Just logging for sanity checking. You wouldn't want to keep this in the module.
self.log(Level.INFO, fileContent)