Hey Everyone,
I’m working on my first python module and I am wondering what the best way to read file content would be? I’ve seen examples of writing the file to a temporary location first but those seem to be centered around running other executable on the file afterwards. In my case I just need to get content as it’s just text data and I can process it within the module.
The module is a Data Source Ingest Module and I have no troubles so far collecting the files I need and getting any metadata that I need. I’d appreciate any advice on reading actual content. I think it’s possible based on some of the AbstractFile object documentation but I’m not sure on the best approach.
Thanks @Mark_McKinnon, that was pretty close to how I had mine based on the example ingest module. Realized that I was overthinking it and that the content would be in the fileBuffer array not the filebytes. There may be a cleaner way to convert the array to text with some of the jython / java implementation but this was the easiest way for me to do it in straight python:
rawFile = ReadContentInputStream(file)
fileBuffer = jarray.zeros(file.getSize(), "b")
filebytes = rawFile.read(fileBuffer)
# For whatever reason the jarray included 2 items at the start of the array.
# (-1, -2) that are not really part of the file.
byte_array = fileBuffer[2:]
fileContent = ""
for byte in byte_array:
fileContent += chr(byte)
fileContent = fileContent.replace('\x00', '')
##Just logging for sanity checking. You wouldn't want to keep this in the module.
self.log(Level.INFO, fileContent)