I have below dataframe in Pandas:
Name | Branch | Class | Details |
---|---|---|---|
Vicky | CSE | IV | [ {“id” : “1234”,“entityType”:{ Name:”Parent” ,Type:”entity”},“name”:”Vikas”},{ “id” : “8974”, “entityType”:{Name:”Parent1”,Type:”entity1”},“name”:”Sachin”},{“id” : 5678”,“entityType”:{Name:”Parent2”,Type:”entity2” },“name”:”Sehwag”}] |
Now, The 4th Column has nested JSON String, i am trying to fetch just the Name field from the nested JSON , as shown below:
Name | Branch | Class | Details.0.Name | Details.1.Name | Details.2.Name |
---|---|---|---|---|---|
Vicky | CSE | IV | Vikas | Sachin | Sehwag |
Is there any way to do this using pandas ?
CodePudding user response:
-------Updated Answer after Question Revised-----------
Knowing that you want to extract the name field for all of the objects in the Details column, you could try the below:
df["Detail_Names"] = df["Details"].apply(lambda x: [o["name"] for o in x])
df2 = pd.DataFrame(df["Detail_Names"].tolist())
df.merge(df2, left_index=True, right_index=True, how="outer").drop(["Details", "Detail_Names"], axis=1)
And if your "Details" column actually contains a string of JSON, then the first line would change to the following in order to convert the JSON string into a list first:
df["Details"] = df["Details"].apply(lambda x: [o["name"] for o in json.loads(x)])
(Note that I didn't bother rename the new columns.)
-----------Original Answer-----------------
If your "Details" column is a string data type that contains a JSON encoded string, then you might be able to use the following to extract into a new Series the "name" property of the 3rd object, using the built-in json library to decode the JSON string in each row to an object:
df["Details"].apply(lambda x: json.loads(x)[2]["name"])
On the other hand, if your column already contains a dict object from decoding of the json string, then you could do this:
df["Details"].apply(lambda x: x[2]["name"])
Here is a prior SO thread that might also provide some alternative ways to work with JSON in a DataFrame that use Pandas built-in methods, but I'm not sure if this works with a JSON array - it might only work with a JSON object in the column/Series. Looks like this is focused on splitting all the JSON elements into new columns, rather than extracting a specific one, but maybe there is a way to use this. If so, it might be more performant than my above suggestions using the apply function, particularly if you have a very large DataFrame.