I have many json
files under /json/reports/
location and for each json
file the output need to be converted into csv
file individually.
I have the following python code to convert.
import pandas as pd
import glob
path = '/json/reports/*.json'
files = glob.glob(path)
for file in files:
f = open(file, 'r')
jsonData = pd.read_json(f.read())
jsonData.to_csv(f.name ".csv")
f.close()
One of my json file output as follows.
[
{
"api_result": "KEY_NAME",
"ml_result": "VALUE",
"line_data_list": [
{
"line": "54A2FF607A6dsewroadeEOERD> |-",
"line_num": 9053,
"path": "/home/user/src/common/race/flow/prog_flow.mk",
"value": "WOERJFOQDKSDFKKASDF0",
"variable": null,
"entropy_validation": true
}
],
"ml_part": 0.994396984577179,
"rule": "GCP Client ID",
"severity": "high"
},
.....
.......
..........
Problem:-:-
The above python code writing line_data_list
list values (line
, line_num
, path
, value
, variable
, & entropy_validation
) in single column, but I need each value in a seprate column. (Ie specified in below format).
Expected output csv per json file:-
Sl.no | api_result | ml_result | line_data_list | line | line_num | path | value | variable | entropy_validation | ml_part | rule | severity |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | KEY_NAME | VALUE | 54A2FF607A6dsewroadeEOERD | 9053 | /home/user98/src/common/race/flow/prog_flow.mk | WOERJFOQDKSDFKKASDFO | null | TRUE | 0.994396985 | GCP Client ID | high | |
2 | ||||||||||||
3 |
Need help to print each in separate column.
CodePudding user response:
You need to unpack your line_data_list key-value pairs so they occur on the same level as your other columns. Something like what I've written below would work.
import pandas as pd
import glob
import json
path = '/json/reports/*.json'
files = glob.glob(path)
for file in files:
f = open(file, 'r')
json_dict = json.load(f)
line_data = json_dict[0].pop("line_data_list")
{json_dict.update(header, val) for header, val in line_data.items()}
jsonData = pd.from_dict(json_dict, orient="records")
jsonData.to_csv(f.name ".csv")
f.close()
CodePudding user response:
Use:
j1 = '''[
{
"api_result": "KEY_NAME",
"ml_result": "VALUE",
"line_data_list": [
{
"line": "54A2FF607A6dsewroadeEOERD> |-",
"line_num": 9053,
"path": "/home/user/src/common/race/flow/prog_flow.mk",
"value": "WOERJFOQDKSDFKKASDF0",
"variable": "null",
"entropy_validation": "True"}
],
"ml_part": 0.994396984577179,
"rule": "GCP Client ID",
"severity": "high"
}]'''
temp = eval(j1)[0]
temp['line_data_list'] = temp['line_data_list'][0]
pd.json_normalize(temp)
Output:
api_result ml_result ml_part rule severity line_data_list.line line_data_list.line_num line_data_list.path line_data_list.value line_data_list.variable line_data_list.entropy_validation
0 KEY_NAME VALUE 0.994397 GCP Client ID high 54A2FF607A6dsewroadeEOERD> |- 9053 /home/user/src/common/race/flow/prog_flow.mk WOERJFOQDKSDFKKASDF0 null True
1