Home > Software design >  Extract a value inside a json column in pandas
Extract a value inside a json column in pandas

Time:05-14

I have a json column in a pandas dataframe and I need to create a new column based on a value in the json column.

case#             json_col

123     [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
321     [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'High', 'id': 12}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]

I need to extract the value from the priority as

case#  Priority
 123    Medium
 321    High

Edit: As there are multiple name field in the json, What is the best way to extract name that has type 'priority'

CodePudding user response:

You can use pandas build in pd.json_normalize() and limit the column only to the ones that you need

import pandas as pd
j = [{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '', 'due_date_change': None, 'by': {'email': 'test.com', 'type': 'staff', 'id': 23, 'name': 'test.com'}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None},{'priority_change': None, 'category_change': None, 'custom_field_change': None, 'timestamp': '2022-04-18 21:36:00', 'due_date_change': None, 'by': {'type': 'priority', 'name': 'Medium', 'id': 32}, 'time_spent': None, 'update_id': 234, 'message': None, 'assignee_change': None, 'status_change': None, 'satisfaction_survey': None}]
df = pd.json_normalize(j)
df[['by.name']]

CodePudding user response:

You can try loop the list and extract name whose type is priority then get first item from the output.

df['Priority'] = df['json_col'].apply(lambda lst: next(iter([d['by']['name'] for d in lst if d['by']['type'] == 'priority'])), None)
print(df)

   case#                                           json_col Priority
0    123  [{'priority_change': None, 'category_change': ...   Medium
1    321  [{'priority_change': None, 'category_change': ...     High
  • Related