Identifier Properties
1 [{"$id":"2","SMName":"pia.redoabs.com","Type":"sms"},{"$id":"3","Name":"_18_Lucene41_0.doc","Type":"file"}]
2 [{"$id":"2","SMName":"pred.redocad.com","Type":"sms"},{"$id":"3","Name":"_18_Nil41_0.doc","Type":"file"}]
3 [{"$id":"2","SMName":"promomaster.com","Type":"sms"},{"$id":"3","Name":"_17_Litre41_0.doc","Type":"file"}]
4 [{"$id":"2","SMName":"admaster.com","Type":"sms"},{"$id":"3","Name":"_k.pos","Type":"file"}]
5 [{"$id":"2","SMName":"plan.com.com","Type":"sms"},{"$id":"3","Name":"_3_Lucene41_0.doc","Type":"file"}]
This is a dataframe with 2 columns "Identifier" & "Properties"> The "Properties" column appears as a list of json.The aim is to create 2 different columns for "sms" & "file". Tried json_normalize to create 2 different columns . But that did not work. I do not have lot of experience with jsons. Can someone please help ?
CodePudding user response:
You can try:
import json
df["Properties"] = df["Properties"].apply(
lambda x: {
d["Type"]: (d["SMName"] if d["Type"] == "sms" else d["Name"])
for d in json.loads(x)
}
)
df = pd.concat([df, df.pop("Properties").apply(pd.Series)], axis=1)
print(df)
Prints:
Identifier sms file
0 1 pia.redoabs.com _18_Lucene41_0.doc
1 2 pred.redocad.com _18_Nil41_0.doc
2 3 promomaster.com _17_Litre41_0.doc
3 4 admaster.com _k.pos
4 5 plan.com.com _3_Lucene41_0.doc