I have the following csv file (converted to Dataframe)
asOfDate,name,details |
2022-06-06,Vasanth,{"degree":"BE", "score":72.0, "subject" :"ECE"}
And then, using the above file, I need to convert to the following using pandas
asOfDate,name,details.degree,details.score,details.subject
2022-06-06,Vasanth,BE,72.0,ECE
I tried to use json_normalize, but getting error "str" has no attribute values
Please let me know how to convert into prescribed format
CodePudding user response:
You can expand the json field to columns by applying json.loads
to it, convert the values to a list and then join
that to the rest of the dataframe, dropping the redundant details
column:
df.join(pd.DataFrame(df['details'].apply(json.loads).tolist())).drop('details',axis=1)
Output:
asOfDate name degree score subject
0 2022-06-06 Vasanth BE 72.0 ECE
CodePudding user response:
I think you received the error because the data in the details column is still a sting
not a json/dict
object.
Here we print the type of the first element in the details
column
df = pd.read_csv('df.csv')
print(type(df['details'][0]))
<class 'str'>
- So you can simply convert it to
json/dict
object as Nick suggested
df['details'] = df['details'].apply(json.loads)
print(type(df['details'][0]))
<class 'dict'>
- Use
json_normalize
to convert thejson
objects toDataFrame
df2 = pd.json_normalize(df['details'])
print(df2)
degree score subject
0 BE 72.0 ECE
1 BE 72.0 ECE
- we join the 2 dataframes and remove the
details
column
df = df.join(df2).drop('details', axis=1)
print(df)
asOfDate name degree score subject
0 2022-06-06 Vasanth BE 72.0 ECE
1 2022-06-06 Vasanth BE 72.0 ECE
finally all in one line
import json
import pandas as pd
df = pd.read_csv('df.csv')
df = df.join(pd.json_normalize(df['details'].apply(json.loads))).drop('details', axis=1)
print(df)
asOfDate name degree score subject
0 2022-06-06 Vasanth BE 72.0 ECE
1 2022-06-06 Vasanth BE 72.0 ECE