Home > Back-end >  Pandas JSON Normalize multiple columns in a dataframe
Pandas JSON Normalize multiple columns in a dataframe

Time:12-03

So I have the following dataframe:

enter image description here

The JSON blobs all look something like this:

{"id":"dddd1", "random_number":"77777"}

What I want my dataframe to look like is something like this:

enter image description here

Basically what I need is to get a way to iterate and normalize all the JSON blob columns and put them back in the dataframe in the proper rows (0-99). I have tried the following:

pd.json_normalize(data_frame.iloc[:, JSON_0,JSON_99])

I get the following error:

IndexingError: Too many indexers

I could go through and normalize each JSON_BLOB column individually however that is inefficient, I cant think of a proper way to do this via a Lambda function or for loop because of the JSON blob. The for loop I wrote gives me the same error:

array=[]
for app in data_frame.iloc[:, JSON_0,JSON_99]:
    data = {
        'id': data['id']
        
    }
    array.append(data)


test= pd.DataFrame(array)


IndexingError: Too many indexers

Also some of the JSON_Blobs have NAN values

Any suggestions would be great.

CodePudding user response:

Can you try this:

normalized = pd.concat([df[i].apply(pd.Series) for i in df.iloc[:,2:]],axis=1) #2 is the position number of JSON_0.
final = pd.concat([df[['Root_id_PK','random_number']],normalized],axis=1)

if you want the column names as in the question:

normalized = pd.concat([df[i].apply(pd.Series).rename(columns={'id':'id_from_{}'.format(i),'random_number':'random_number_from_{}'.format(i)}) for i in df.iloc[:,2:]],axis=1)
final = pd.concat([df[['Root_id_PK','random_number']],normalized],axis=1)
  • Related