I have this json file sample:
{"T1": [{"time":"2021-03-25T13:31:58 01:00","data":-1}],"T2": [{"time":"2021-03-25T13:17:02 01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}]}
and I am loading it like this:
import os
import glob
import pandas as pd
import json
path_to_json = 'dir/dir/data.json'
df = pd.read_json(path_to_json, lines=True)
df
and it looks like this:
When I try to call json_normalize
like pd.json_normalize(df)
it doesn't work.
Any suggestions on how to normalize this json dataframe into T1_time, T1_data and so on would be highly appreciated!
CodePudding user response:
I found it is simpler to get first dictionary from list, convert to Series and concatenate
df1 = df['T1'].str[0].apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].str[0].apply(pd.Series).add_prefix('T2.')
new_df = pd.concat([df1, df2], axis=1)
And the same using loop with column's names
dfs = [df[col].str[0].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]
new_df = pd.concat(dfs, axis=1)
Minimal working code.
I use io.StrigIO()
only to simulate file in memory - so everyone can simply copy and run it
import pandas as pd
import io
data = '{"T1": [{"time":"2021-03-25T13:31:58 01:00","data":-1}], "T2": [{"time":"2021-03-25T13:17:02 01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}]}'
df = pd.read_json(io.StringIO(data), lines=True)
print(df)
# ----------------
df1 = df['T1'].str[0].apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].str[0].apply(pd.Series).add_prefix('T2.')
new_df = pd.concat([df1, df2], axis=1)
print(new_df)
# ----------------
dfs = [df[col].str[0].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]
new_df = pd.concat(dfs, axis=1)
print(new_df)
Result:
T1 T2
0 [{'time': '2021-03-25T13:31:58 01:00', 'data':... [{'time': '2021-03-25T13:17:02 01:00', 'data':...
T1.time T1.data T2.time T2.data
0 2021-03-25T13:31:58 01:00 -1 2021-03-25T13:17:02 01:00 [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]
T1.time T1.data T2.time T2.data
0 2021-03-25T13:31:58 01:00 -1 2021-03-25T13:17:02 01:00 [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]
CodePudding user response:
can you try this:
import os
import glob
import pandas as pd
import json
path_to_json = 'dir/dir/data.json'
df = pd.read_json(path_to_json, lines=True)
df=df.explode('T1').explode('T2')
df=df.join(pd.json_normalize(df.pop('T1')))
df=pd.concat([df,pd.json_normalize(df.pop('T2'))])
print(df)
'''
time data
0 2021-03-25T13:31:58 01:00 -1
0 2021-03-25T13:17:02 01:00 [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]
'''