Home > Enterprise >  Parsing json in csv in pandas not working
Parsing json in csv in pandas not working

Time:04-20

I have this data:

#my.csv
"accountid","configuration"
"797847293","{'dBInstanceIdentifier': 'grafanadb', 'dBInstanceClass': 'db.t3.micro', 'engine': 'postgres', 'dBInstanceStatus': 'available'}"

How can I bring into format like:

# result.csv
accountid,dBInstanceIdentifier,dbInstanceClass,engine
797847293,grafanadb,db.t3.micro,engine

I tried it with python pandas, but not result so far respectively errors.

Any ideas?

CodePudding user response:

Assuming you've gotten this csv file from some data source, it shouldn't be this inconsistent in its formatting and should have a double quote " at the end of the second line which I believe you've missed out when copying over. Like this:

#my.csv
"accountid","configuration"
"797847293","{'dBInstanceIdentifier': 'grafanadb', 'dBInstanceClass': 'db.t3.micro', 'engine': 'postgres', 'dBInstanceStatus': 'available'}"

If that is the case, you can simply specify the quotechar argument in pd.read_csv to tell it to ignore delimiters inside the double quotes "

df = pd.read_csv('a.csv', quotechar='"')

You can then do your processing in pandas before writing back into csv from here.

CodePudding user response:

You can read the csv to dataframe and convert configuration to dict with ast.literal_eval. Then use pd.json_normalize to convert a column of dictionary into columns.

import ast

df = pd.read_csv('data.csv', quotechar='"')

out = pd.concat([df['accountid'], pd.json_normalize(df['configuration'].apply(ast.literal_eval))], axis=1)
print(out)

   accountid dBInstanceIdentifier dBInstanceClass    engine dBInstanceStatus
0  797847293            grafanadb     db.t3.micro  postgres        available
  • Related