Home > Net >  How to extract key value from deep dictionary in pandas || Python || dataframe
How to extract key value from deep dictionary in pandas || Python || dataframe

Time:10-03

I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic. However, unfortunately, I have deep dictionaries available in a few columns in the data frame. I am unable to extract key values from it. I am attaching the CSV file with a few columns and the important one is the "guest" column.

I have been looking on the internet and have tried so many things that by now I am so confused about what all is correct and incorrect. below is the snapshot of my code and trials.

Adata = response.json()

## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)

## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')

## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)

BA = df['guest'].str.strip().to_frame()
print(BA)

BA.to_csv('BA_sheet.csv')

##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')


## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)

## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)

All I want is to extract id, first_name and the last_namefrom the dictionary from that guest column. Use this link to access the csv file which has the DataFrame result.

CodePudding user response:

The way you're doing it, you're trying to extract your first_name, last_name and id keys from a str representation of a dict. You can convert it back to a dict using the eval builtin (not recommended if you're not sure of where the data is coming from), or the ast.literal_eval function from the ast module.

import ast

df['guest'] = df['guest'].apply(ast.literal_eval)

Once you have the guest dictionaries as dict objects, you can simply apply pd.Series to convert it to a separate DataFrame

guest_df = df['guest'].apply(pd.Series)

guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name
  • Related