Home > Software engineering >  Convert nested JSON from a CSV into a Pandas dataframe
Convert nested JSON from a CSV into a Pandas dataframe

Time:10-14

I am desperately trying to convert a nested JSON feature within a CSV into data frame rows. Could you help?

Sample CSV row

2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v

Desired output (using JSON part only):

id message request httpCode service timestamp
0 Success (Cached) GET /api/v1/settings?id=3 200 stats-vis-backend 2021-09-26 08:25:43

If this would be the data frame output structure, I would be more than happy. I tried JSON normalize etc., but I am far away from a solution.

Thanks so much!!

Best David

Full Code trial (based on SeaBean):


import csv
import ast
import pandas as pd

# read CSV 
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)

print(df.head(1))

# convert string of JSON/dict to real JSON/dict 
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

print(df_json.head(1))

Full output dump

File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
    df[1] = df[1].apply(ast.literal_eval)

  File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    > next start
    ^
SyntaxError: invalid syntax

Sample Output df1

0         {"level":"info","message":"Success (Cached)","...
1         {"level":"info","message":"Success (Cached)","...
2         {"level":"info","message":"Success (Cached)","...
3         {"level":"info","message":"Success","request":...
4         {"level":"info","message":"Success (Cached)","...
                                ...                        
249995    {"level":"info","message":"Success (Cached)","...
249996    {"level":"info","message":"Success (Cached)","...
249997    {"level":"info","message":"Success (Cached)","...
249998    {"level":"info","message":"Success","request":...
249999    {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object

Sample toDict() Output of df1

{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
 1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',

Output print(df.iloc[[4480]])

                               0             1  \
4480  2021-09-26T12:00:58.983344643Z  > next start   

                                                   2  \
4480  ip-10-xxx-xxxx-30.eu-central-1.compute.internal   

                                  3  
4480  xxxx-converter-75ffxf6b-jq2w7  

CodePudding user response:

You can use pd.DataFrame on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:

# read CSV 
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)

# convert string of JSON/dict to real JSON/dict 
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1 for the column label for second column in the codes above.

Result:

print(df_json)


  level           message                   request  httpCode            service            timestamp
0  info  Success (Cached)  GET /api/v1/settingsid=3       200  stats-vis-backend  2021-09-26 08:25:43
  • Related