Reading File Content - Python Module Development

Hey Everyone,
I’m working on my first python module and I am wondering what the best way to read file content would be? I’ve seen examples of writing the file to a temporary location first but those seem to be centered around running other executable on the file afterwards. In my case I just need to get content as it’s just text data and I can process it within the module.

The module is a Data Source Ingest Module and I have no troubles so far collecting the files I need and getting any metadata that I need. I’d appreciate any advice on reading actual content. I think it’s possible based on some of the AbstractFile object documentation but I’m not sure on the best approach.

Thanks!
-Nate

Try this and see if it works for you.

import jarray
from org.sleuthkit.datamodel import ReadContentInputStream

fileManager = Case.getCurrentCase().getServices().getFileManager()
files = fileManager.findFiles(dataSource, “%”)
numFiles = len(files)
self.log(Level.INFO, “found " + str(numFiles) + " files”)

for file in fileList:
fileContent = ReadContentInputStream(file)
fileBuffer = jarray.zeros(file.getSize(), “b”)
filebytes = fileContent.read(fileBuffer)

Thanks @Mark_McKinnon, that was pretty close to how I had mine based on the example ingest module. Realized that I was overthinking it and that the content would be in the fileBuffer array not the filebytes. There may be a cleaner way to convert the array to text with some of the jython / java implementation but this was the easiest way for me to do it in straight python:

rawFile = ReadContentInputStream(file)
fileBuffer = jarray.zeros(file.getSize(), "b")
filebytes = rawFile.read(fileBuffer)
# For whatever reason the jarray included 2 items at the start of the array.
# (-1, -2) that are not really part of the file. 
byte_array = fileBuffer[2:] 

fileContent = ""
for byte in byte_array:
    fileContent += chr(byte)
fileContent = fileContent.replace('\x00', '')
##Just logging for sanity checking. You wouldn't want to keep this in the module.
self.log(Level.INFO, fileContent)