Home > front end >  How to normalize json from pandas dataframe
How to normalize json from pandas dataframe

Time:10-17

I have this json file sample:

{"T1": [{"time":"2021-03-25T13:31:58 01:00","data":-1}],"T2": [{"time":"2021-03-25T13:17:02 01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}]}

and I am loading it like this:

import os 
import glob
import pandas  as pd
import json

path_to_json = 'dir/dir/data.json' 

df = pd.read_json(path_to_json, lines=True)

df

and it looks like this:

enter image description here

When I try to call json_normalize like pd.json_normalize(df) it doesn't work. Any suggestions on how to normalize this json dataframe into T1_time, T1_data and so on would be highly appreciated!

CodePudding user response:

I found it is simpler to get first dictionary from list, convert to Series and concatenate

df1 = df['T1'].str[0].apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].str[0].apply(pd.Series).add_prefix('T2.')

new_df = pd.concat([df1, df2], axis=1)

And the same using loop with column's names

dfs = [df[col].str[0].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]

new_df = pd.concat(dfs, axis=1)

Minimal working code.

I use io.StrigIO() only to simulate file in memory - so everyone can simply copy and run it

import pandas as pd
import io

data = '{"T1": [{"time":"2021-03-25T13:31:58 01:00","data":-1}], "T2": [{"time":"2021-03-25T13:17:02 01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}]}'
df = pd.read_json(io.StringIO(data), lines=True)
print(df)

# ----------------

df1 = df['T1'].str[0].apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].str[0].apply(pd.Series).add_prefix('T2.')
new_df = pd.concat([df1, df2], axis=1)
print(new_df)

# ----------------

dfs = [df[col].str[0].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]
new_df = pd.concat(dfs, axis=1)
print(new_df)

Result:

                                                  T1                                                 T2
0  [{'time': '2021-03-25T13:31:58 01:00', 'data':...  [{'time': '2021-03-25T13:17:02 01:00', 'data':...

                     T1.time  T1.data                    T2.time                                           T2.data
0  2021-03-25T13:31:58 01:00       -1  2021-03-25T13:17:02 01:00  [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]

                     T1.time  T1.data                    T2.time                                           T2.data
0  2021-03-25T13:31:58 01:00       -1  2021-03-25T13:17:02 01:00  [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]

CodePudding user response:

can you try this:

import os 
import glob
import pandas  as pd
import json

path_to_json = 'dir/dir/data.json' 

df = pd.read_json(path_to_json, lines=True)
df=df.explode('T1').explode('T2')
df=df.join(pd.json_normalize(df.pop('T1')))
df=pd.concat([df,pd.json_normalize(df.pop('T2'))])
print(df)
'''
    time                        data
0   2021-03-25T13:31:58 01:00   -1
0   2021-03-25T13:17:02 01:00   [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]

'''

  • Related