json data into individual csv file-CodePudding

I have many json files under /json/reports/ location and for each json file the output need to be converted into csv file individually.

I have the following python code to convert.

import pandas as pd
import glob
path = '/json/reports/*.json'
files = glob.glob(path)
for file in files:
    f = open(file, 'r')
    jsonData = pd.read_json(f.read())
    jsonData.to_csv(f.name ".csv")
    f.close()

One of my json file output as follows.

 [
    {
        "api_result": "KEY_NAME",
        "ml_result": "VALUE",
        "line_data_list": [
            {
                "line": "54A2FF607A6dsewroadeEOERD> |-",
                "line_num": 9053,
                "path": "/home/user/src/common/race/flow/prog_flow.mk",
                "value": "WOERJFOQDKSDFKKASDF0",
                "variable": null,
                "entropy_validation": true
            }
        ],
        "ml_part": 0.994396984577179,
        "rule": "GCP Client ID",
        "severity": "high"
    },
.....
.......
..........

Problem:-:-

The above python code writing line_data_list list values (line, line_num, path, value, variable, & entropy_validation) in single column, but I need each value in a seprate column. (Ie specified in below format).

Expected output csv per json file:-

Sl.no	api_result	ml_result	line	line_num	path	value	variable	entropy_validation	ml_part	rule	severity
1	KEY_NAME	VALUE	54A2FF607A6dsewroadeEOERD	9053	/home/user98/src/common/race/flow/prog_flow.mk	WOERJFOQDKSDFKKASDFO	null	TRUE	0.994396985	GCP Client ID	high
2
3

Need help to print each in separate column.

CodePudding user response：

You need to unpack your line_data_list key-value pairs so they occur on the same level as your other columns. Something like what I've written below would work.

import pandas as pd
import glob
import json

path = '/json/reports/*.json'
files = glob.glob(path)
for file in files:
    f = open(file, 'r')
    json_dict = json.load(f)
    line_data = json_dict[0].pop("line_data_list")
    {json_dict.update(header, val) for header, val in line_data.items()}
    jsonData = pd.from_dict(json_dict, orient="records")
    jsonData.to_csv(f.name ".csv")
    f.close()

CodePudding user response：

Use:

j1 = '''[
    {
        "api_result": "KEY_NAME",
        "ml_result": "VALUE",
        "line_data_list": [
            {
                "line": "54A2FF607A6dsewroadeEOERD> |-",
                "line_num": 9053,
                "path": "/home/user/src/common/race/flow/prog_flow.mk",
                "value": "WOERJFOQDKSDFKKASDF0",
                "variable": "null",
                "entropy_validation": "True"}
        ],
        "ml_part": 0.994396984577179,
        "rule": "GCP Client ID",
        "severity": "high"
    }]'''
temp = eval(j1)[0]
temp['line_data_list'] = temp['line_data_list'][0]
pd.json_normalize(temp)

Output:

api_result  ml_result   ml_part rule    severity    line_data_list.line line_data_list.line_num line_data_list.path line_data_list.value    line_data_list.variable line_data_list.entropy_validation
0   KEY_NAME    VALUE   0.994397    GCP Client ID   high    54A2FF607A6dsewroadeEOERD> |-   9053    /home/user/src/common/race/flow/prog_flow.mk    WOERJFOQDKSDFKKASDF0    null    True
1