Home > Enterprise >  How to fetch one value from one JSON column of dataframe pandas
How to fetch one value from one JSON column of dataframe pandas

Time:12-25

I have below dataframe in Pandas:

Name Branch Class Details
Vicky CSE IV [ {“id” : “1234”,“entityType”:{ Name:”Parent” ,Type:”entity”},“name”:”Vikas”},{ “id” : “8974”, “entityType”:{Name:”Parent1”,Type:”entity1”},“name”:”Sachin”},{“id” : 5678”,“entityType”:{Name:”Parent2”,Type:”entity2” },“name”:”Sehwag”}]

Now, The 4th Column has nested JSON String, i am trying to fetch just the Name field from the nested JSON , as shown below:

Name Branch Class Details.0.Name Details.1.Name Details.2.Name
Vicky CSE IV Vikas Sachin Sehwag

Is there any way to do this using pandas ?

CodePudding user response:

-------Updated Answer after Question Revised-----------

Knowing that you want to extract the name field for all of the objects in the Details column, you could try the below:

df["Detail_Names"] = df["Details"].apply(lambda x: [o["name"] for o in x])
df2 = pd.DataFrame(df["Detail_Names"].tolist())
df.merge(df2, left_index=True, right_index=True, how="outer").drop(["Details", "Detail_Names"], axis=1)

And if your "Details" column actually contains a string of JSON, then the first line would change to the following in order to convert the JSON string into a list first:

df["Details"] = df["Details"].apply(lambda x: [o["name"] for o in json.loads(x)])

(Note that I didn't bother rename the new columns.)

-----------Original Answer-----------------

If your "Details" column is a string data type that contains a JSON encoded string, then you might be able to use the following to extract into a new Series the "name" property of the 3rd object, using the built-in json library to decode the JSON string in each row to an object:

df["Details"].apply(lambda x: json.loads(x)[2]["name"])

On the other hand, if your column already contains a dict object from decoding of the json string, then you could do this:

df["Details"].apply(lambda x: x[2]["name"])

Here is a prior SO thread that might also provide some alternative ways to work with JSON in a DataFrame that use Pandas built-in methods, but I'm not sure if this works with a JSON array - it might only work with a JSON object in the column/Series. Looks like this is focused on splitting all the JSON elements into new columns, rather than extracting a specific one, but maybe there is a way to use this. If so, it might be more performant than my above suggestions using the apply function, particularly if you have a very large DataFrame.

  • Related