I have this data:
#my.csv
"accountid","configuration"
"797847293","{'dBInstanceIdentifier': 'grafanadb', 'dBInstanceClass': 'db.t3.micro', 'engine': 'postgres', 'dBInstanceStatus': 'available'}"
How can I bring into format like:
# result.csv
accountid,dBInstanceIdentifier,dbInstanceClass,engine
797847293,grafanadb,db.t3.micro,engine
I tried it with python pandas, but not result so far respectively errors.
Any ideas?
CodePudding user response:
Assuming you've gotten this csv file from some data source, it shouldn't be this inconsistent in its formatting and should have a double quote "
at the end of the second line which I believe you've missed out when copying over. Like this:
#my.csv
"accountid","configuration"
"797847293","{'dBInstanceIdentifier': 'grafanadb', 'dBInstanceClass': 'db.t3.micro', 'engine': 'postgres', 'dBInstanceStatus': 'available'}"
If that is the case, you can simply specify the quotechar
argument in pd.read_csv
to tell it to ignore delimiters inside the double quotes "
df = pd.read_csv('a.csv', quotechar='"')
You can then do your processing in pandas before writing back into csv from here.
CodePudding user response:
You can read the csv to dataframe and convert configuration
to dict with ast.literal_eval
. Then use pd.json_normalize
to convert a column of dictionary into columns.
import ast
df = pd.read_csv('data.csv', quotechar='"')
out = pd.concat([df['accountid'], pd.json_normalize(df['configuration'].apply(ast.literal_eval))], axis=1)
print(out)
accountid dBInstanceIdentifier dBInstanceClass engine dBInstanceStatus
0 797847293 grafanadb db.t3.micro postgres available