I have a json column in a pandas dataframe and I need to create a new column based on a value in the json column.
case# json_col
123 [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
321 [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'High', 'id': 12}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
I need to extract the value from the priority as
case# Priority
123 Medium
321 High
Edit: As there are multiple name field in the json, What is the best way to extract name that has type 'priority'
CodePudding user response:
You can use pandas build in pd.json_normalize() and limit the column only to the ones that you need
import pandas as pd
j = [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
df = pd.json_normalize(j)
df[['by.name']]
CodePudding user response:
You can try loop the list and extract name
whose type
is priority
then get first item from the output.
df['Priority'] = df['json_col'].apply(lambda lst: next(iter([d['by']['name'] for d in lst if d['by']['type'] == 'priority'])), None)
print(df)
case# json_col Priority
0 123 [{'priority_change': None, 'category_change': ... Medium
1 321 [{'priority_change': None, 'category_change': ... High