example of log file:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
It will generate 5 files:
1.timestamp.column
2.Field1.column
3.Field_Doc.f1.column
4.Field_Doc.f2.column
5.Field_Doc.f3.column
Example content of timestamp.column:
2022-01-14T00:12:21.000
2022-01-18T00:15:51.000
I'm facing a problem while the values of keys are null, undefined as when the value us is null for example:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}
can someone help me out here?
CodePudding user response:
Note, the input file is actually an NDJSON. See the docs.
That being said, since furas already gave an excellent answer on how to process the NDJSON logfile I'm going to skip that part. Do note that there's a library to deal with NDJSON files. See PyPI.
His code needs minimal adjustment to deal with the undefined
edge case. The null
value is a valid JSON value, so his code doesn't break on that.
You can fix this easily by a string.replace()
while doing the json.loads()
so it becomes valid JSON, and then you can check while writing if value == None
to replace the value with an empty string. Note that None
is the python equivalent of JSON's null
.
Please note the inclusion of :
in the replace function, it's to prevent false negatives...
main loop logic
for line in file_obj:
# the replace function makes it valid JSON
data = json.loads(line.replace(': undefined', ': null'))
print(data)
process_dict(data, write_func)
write_func() function adjustment
def write_func(key, value):
with open(key '.column', "a") as f:
# if the value == None, make it an empty string.
if value == None:
value = ''
f.write(str(value) "\n")
I used the following as the input string:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}