Extracting information from a list of json Python-CodePudding

Identifier Properties  
   1       [{"$id":"2","SMName":"pia.redoabs.com","Type":"sms"},{"$id":"3","Name":"_18_Lucene41_0.doc","Type":"file"}]
   2       [{"$id":"2","SMName":"pred.redocad.com","Type":"sms"},{"$id":"3","Name":"_18_Nil41_0.doc","Type":"file"}]
   3      [{"$id":"2","SMName":"promomaster.com","Type":"sms"},{"$id":"3","Name":"_17_Litre41_0.doc","Type":"file"}]
   4      [{"$id":"2","SMName":"admaster.com","Type":"sms"},{"$id":"3","Name":"_k.pos","Type":"file"}]
   5      [{"$id":"2","SMName":"plan.com.com","Type":"sms"},{"$id":"3","Name":"_3_Lucene41_0.doc","Type":"file"}]

This is a dataframe with 2 columns "Identifier" & "Properties"> The "Properties" column appears as a list of json.The aim is to create 2 different columns for "sms" & "file". Tried json_normalize to create 2 different columns . But that did not work. I do not have lot of experience with jsons. Can someone please help ?

CodePudding user response：

You can try:

import json

df["Properties"] = df["Properties"].apply(
    lambda x: {
        d["Type"]: (d["SMName"] if d["Type"] == "sms" else d["Name"])
        for d in json.loads(x)
    }
)

df = pd.concat([df, df.pop("Properties").apply(pd.Series)], axis=1)

print(df)

Prints:

   Identifier               sms                file
0           1   pia.redoabs.com  _18_Lucene41_0.doc
1           2  pred.redocad.com     _18_Nil41_0.doc
2           3   promomaster.com   _17_Litre41_0.doc
3           4      admaster.com              _k.pos
4           5      plan.com.com   _3_Lucene41_0.doc