The zendesk api returns fields as a list of dictionaries but each list is a single record. I'm wondering if there is a better way to turn it all into a dataframe. If it's a dictionary of dictionaries then json_normalize
takes care of it with no problem.
Caveat: not all records will have the same field IDs
Sample data:
data = [{
"ticket_id": 4,
"customer_id": 8,
"created_at": "2022-05-01",
"custom_fields": [
{
"id": 15,
"value": "website"
},
{
"id": 16,
"value": "broken"
},
{
"id": 23,
"value": None
},
],
'group_id': 42
}]
Running any form of Dataframe, from_records, from_json, or json_normalizes gives most of what I want but with the list in a single column:
t_df = pd.json_normalize(data)
t_df
Output:
ticket_id | customer_id | created_at | custom_fields | group_id | |
---|---|---|---|---|---|
0 | 4 | 8 | 2022-05-01 | [{'id': 15, 'value': 'website'}, {'id': 16, 'v... | 42 |
My current, probably ill-advised, solution is:
t_df = pd.DataFrame(sample_df.at[0, 'custom_fields']).T.reset_index(drop=True)
t_df.rename(columns=t_df.iloc[0], inplace=True)
t_df.drop(0, inplace=True)
t_df.reset_index(drop=True, inplace=True)
pd.merge(left=sample_df, left_index=True,
right=t_df, right_index=True).drop(columns='custom_fields')
This results in a correct record that I could append to a main dataframe:
ticket_id | customer_id | created_at | group_id | 15 | 16 | 23 | |
---|---|---|---|---|---|---|---|
0 | 4 | 8 | 2022-05-01 | 42 | website | broken | None |
My worry is that i need to do this to ~25,000 records and this seems like it will be both slow and brittle (prone to breaking).
CodePudding user response:
You should wrangle the data/dictionary first and only then construct a DataFrame with it. It will make your life easier and is faster than trying to manipulate the data with pandas
i.e. after the DataFrame is created.
import pandas as pd
data = [{
"ticket_id": 4,
"customer_id": 8,
"created_at": "2022-05-01",
"custom_fields": [
{
"id": 15,
"value": "website"
},
{
"id": 16,
"value": "broken"
},
{
"id": 23,
"value": None
},
],
'group_id': 42
}]
custom_fields = data[0].pop('custom_fields')
data[0].update({rec['id']: rec['value'] for rec in custom_fields})
t_df = pd.DataFrame(data)
Output:
>>> t_df
ticket_id customer_id created_at group_id 15 16 23
0 4 8 2022-05-01 42 website broken None
CodePudding user response:
import pandas as pd
data = [{
"ticket_id": 4,
"customer_id": 8,
"created_at": "2022-05-01",
"custom_fields": [
{
"id": 15,
"value": "website"
},
{
"id": 16,
"value": "broken"
},
{
"id": 23,
"value": None
},
],
'group_id': 42
}]
df = pd.DataFrame(data)
for index in df.index:
for i in df.loc[index,'custom_fields']:
df.loc[index,i['id']] = i['value']
df.drop(columns = 'custom_fields',inplace = True)
df