I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic. However, unfortunately, I have deep dictionaries available in a few columns in the data frame. I am unable to extract key values from it. I am attaching the CSV file with a few columns and the important one is the "guest" column.
I have been looking on the internet and have tried so many things that by now I am so confused about what all is correct and incorrect. below is the snapshot of my code and trials.
Adata = response.json()
## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)
## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')
## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)
BA = df['guest'].str.strip().to_frame()
print(BA)
BA.to_csv('BA_sheet.csv')
##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')
## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)
## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)
All I want is to extract id
, first_name
and the last_name
from the dictionary from that guest column. Use this link to access the csv file which has the DataFrame result.
CodePudding user response:
The way you're doing it, you're trying to extract your first_name
, last_name
and id
keys from a str representation of a dict. You can convert it back to a dict using the eval
builtin (not recommended if you're not sure of where the data is coming from), or the ast.literal_eval
function from the ast
module.
import ast
df['guest'] = df['guest'].apply(ast.literal_eval)
Once you have the guest dictionaries as dict objects, you can simply apply pd.Series
to convert it to a separate DataFrame
guest_df = df['guest'].apply(pd.Series)
guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name