Home > Blockchain >  Parsing a dataframe with a JSON structure on one of the columns
Parsing a dataframe with a JSON structure on one of the columns

Time:04-12

I have a pandas dataframe in the format below where JSON column is a JSON structure. The goal is to find a Schedule8812 key nested under in ReturnData key object in the JSON column. If a Schedule8812 key is found, it updates the Schedule8812 column to True for that row, and False otherwise.

enter image description here

So far, I've been able to accomplish this by calling the function below.

import pyodbc
import json
from pprint import pprint
import pandas as pd

def iloc_df(df,col):  
    for i in df.index:  
        jdata=json.loads(df[col].iloc[i])
        # print(jdata)
        if 'Schedule8812' in jdata['ReturnData']:
            df.at[i,'Schedule8812'] = True

However, as the rows in my dataframe grows, the longer it completes. I'd like to switch it to a lambda expression but have been getting an error. Can someone point out to me a good reference on how to solve my lambda expression below thats getting an error?

query['Schedule8812']=query['Schedule8812'].apply(lambda x:True if 'Schedule8812' in json.loads(query.JSON.to_string())['ReturnData'] else False)

Below is the error I'm getting. enter image description here

CodePudding user response:

You can test JSON column and in lambda use x instead query.JSON:

query['Schedule8812']=query['JSON'].apply(lambda x: 'Schedule8812' in json.loads(x.to_string())['ReturnData'])

Or if possible test subtring in column JSON use Series.str.contains:

query['Schedule8812']=query['JSON'].str.contains('Schedule8812')
  • Related