I have the following data which contain lists of dictionaries
data= [
{'Time': 18057610.0,
'Flux': [{'V0': -1.4209e-15},
{'V1': 2.7353e-16},
{'V2': 1.1935e-15},
{'V3': 1.1624},
{'V4': -6.1692e-15},
{'V5': 3.2218e-15}]},
{'Time': 18057620.4,
'Flux': [{'V0': 2.4377e-16},
{'V1': -6.2809e-15},
{'V2': 1.6456e-15},
{'V3': 1.1651},
{'V4': 1.7147e-15},
{'V5': 9.8872e-16}]},
{'Time': 18057631.1,
'Flux': [{'V0': 4.1124e-15},
{'V1': 1.5598e-15},
{'V2': -2.325e-16},
{'V3': 1.1638},
{'V4': -3.9983e-15},
{'V5': 4.459e-16}]}]
I want to get something like this:
preferred_df
:
V0 V1 ... V4 V5
Time ...
18057610.0 -1.420900e-15 2.735300e-16 ... -6.169200e-15 3.221800e-15
18057620.4 2.437700e-16 -6.280900e-15 ... 1.714700e-15 9.887200e-16
18057631.1 4.112400e-15 1.559800e-15 ... -3.998300e-15 4.459000e-16
I came up with the following code which serves the purpose:
df = pd.DataFrame(data).explode('Flux').reset_index(drop=True)
df = df.join(pd.DataFrame(df.pop('Flux').values.tolist())).groupby('Time').sum()
However, I don't want to use groupby
and sum()
. What are the other ways (dictionary comprehension?) to flatten the "Flux"
column without getting the NaN values while flattening the dictionaries and get the preferred_df
? I tried json_normalize()
but got same NaNs and needed to use groupby()
and sum()
.
CodePudding user response:
Try:
df = pd.DataFrame(
[{k: v for d2 in d["Flux"] for k, v in d2.items()} for d in data],
index=[d["Time"] for d in data],
)
print(df)
Prints:
V0 V1 V2 V3 V4 V5
18057610.0 -1.420900e-15 2.735300e-16 1.193500e-15 1.1624 -6.169200e-15 3.221800e-15
18057620.4 2.437700e-16 -6.280900e-15 1.645600e-15 1.1651 1.714700e-15 9.887200e-16
18057631.1 4.112400e-15 1.559800e-15 -2.325000e-16 1.1638 -3.998300e-15 4.459000e-16
CodePudding user response:
You could use the built-in collections.ChainMap
method in a list comprehension:
from collections import ChainMap
out = pd.DataFrame([d | ChainMap(*d.pop('Flux')) for d in data])
Output:
Time V0 V1 V2 V3 V4 V5
0 18057610.0 -1.420900e-15 2.735300e-16 1.193500e-15 1.1624 -6.169200e-15 3.221800e-15
1 18057620.4 2.437700e-16 -6.280900e-15 1.645600e-15 1.1651 1.714700e-15 9.887200e-16
2 18057631.1 4.112400e-15 1.559800e-15 -2.325000e-16 1.1638 -3.998300e-15 4.459000e-16
CodePudding user response:
one option is pd.json_normalize
followed by a groupby (it won't be as efficient as the previous options where the wrangling is done within Python):
( pd
.json_normalize(data, 'Flux', ['Time'])
.groupby('Time', as_index = False)
.min()
)
Time V0 V1 V2 V3 V4 V5
0 18057610.0 -1.420900e-15 2.735300e-16 1.193500e-15 1.1624 -6.169200e-15 3.221800e-15
1 18057620.4 2.437700e-16 -6.280900e-15 1.645600e-15 1.1651 1.714700e-15 9.887200e-16
2 18057631.1 4.112400e-15 1.559800e-15 -2.325000e-16 1.1638 -3.998300e-15 4.459000e-1