Home > other >  Resizing a dataframe to account for new values that are extracted from a json in the column
Resizing a dataframe to account for new values that are extracted from a json in the column

Time:01-23

I have extracted data via the github API and then used json.normalise to flatten the data into a dataframe. Unfortunately, some of the data is still in nested dictoinaries in the column. I'm able to extract the value from the dictionary but the problem comes in when there is more than one dictionary in the cell.

enter image description here

How do I manipulate the dataframe so that it resizes to account for the additional values.

Like this:

desired outcome

CodePudding user response:

To reproduce your problem, let's suppose we have this dataframe :

import pandas as pd

df = pd.DataFrame({'ID': [1, 2],
                   'Pull.Request.Files.Nodes': [[{'path':'example 1'}], [{'path':'example 2'}, {'path':'example 3'}]],
                   })
df
   ID                        Pull.Request.Files.Nodes
0   1                         [{'path': 'example 1'}]
1   2  [{'path': 'example 2'}, {'path': 'example 3'}]

We could explode the column 'Pull.Request.Files.Nodes' to extract dictionaries from list, and then we could apply a lambda function, like this :

df = df.explode('Pull.Request.Files.Nodes', ignore_index=True)
df['Pull.Request.Files.Nodes'] = df['Pull.Request.Files.Nodes'].apply(lambda r:r['path'])

Complete code

import pandas as pd

df = pd.DataFrame({'ID': [1, 2],
                   'Pull.Request.Files.Nodes': [[{'path':'example 1'}], [{'path':'example 2'}, {'path':'example 3'}]],
                   })

df = df.explode('Pull.Request.Files.Nodes', ignore_index=True)
df['Pull.Request.Files.Nodes'] = df['Pull.Request.Files.Nodes'].apply(lambda r:r['path'])

#    ID Pull.Request.Files.Nodes
# 0   1                example 1
# 1   2                example 2
# 2   2                example 3
  • Related