I have a pandas series structure like this:
year col2
2019 FIELD_1 1210
FIELD_2 57
FIELD_3 157
2020 FIELD_1 2228
FIELD_2 371
FIELD_3 252
2021 FIELD_1 1505
FIELD_2 138
FIELD_3 133
2022 FIELD_1 836
FIELD_2 166
FIELD_3 75
dtype: int64
I obtain it with groupby from other dataframe: other_dataframe.groupby(['year', 'col2']).size()
I want to obtain a dictionary with this structure:
[
{'year': 2019, 'FIELD_1': 1210, 'FIELD_2': 57,'FIELD_3': 157},
{'year': 2020, 'FIELD_1': 2228, 'FIELD_2': 371,'FIELD_3': 252},
{'year': 2021, 'FIELD_1': 1505, 'FIELD_2': 138,'FIELD_3': 133},
{'year': 2022, 'FIELD_1': 836, 'FIELD_2': 166,'FIELD_3': 75},
]
How can I achive this?
CodePudding user response:
Considering that the multi-index series is assigned to the variable s
and looks like the following
s = pd.Series([1210, 57, 157, 2228, 371, 252, 1505, 138, 133, 836, 166, 75], index=pd.MultiIndex.from_tuples([('2019', 'FIELD_1'), ('2019', 'FIELD_2'), ('2019', 'FIELD_3'), ('2020', 'FIELD_1'), ('2020', 'FIELD_2'), ('2020', 'FIELD_3'), ('2021', 'FIELD_1'), ('2021', 'FIELD_2'), ('2021', 'FIELD_3'), ('2022', 'FIELD_1'), ('2022', 'FIELD_2'), ('2022', 'FIELD_3')]))
[Out]:
2019 FIELD_1 1210
FIELD_2 57
FIELD_3 157
2020 FIELD_1 2228
FIELD_2 371
FIELD_3 252
2021 FIELD_1 1505
FIELD_2 138
FIELD_3 133
2022 FIELD_1 836
FIELD_2 166
FIELD_3 75
One way to do it is as follows (if one doesn't want to get the thought process, go directly to the end of this answer).
Will have to unstack and convert into a list of dictionaries (the list of dictionaries will be the desired output format)
d = s.unstack().to_dict('records')
[Out]:
[{'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]
However, as one can see from the previous output, the year is missing. In order to get the year, one will have to reset the index. So, instead of the previous operation, one will have to do the following
d = s.unstack().reset_index().to_dict('records')
[Out]:
[{'index': '2019', 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'index': '2020', 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'index': '2021', 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'index': '2022', 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]
Finally, as one doesn't want the name to be index
, but year
, one has to rename that column. So, the operation that will take care of every OP's goal is as follows
d = s.unstack().reset_index().rename(columns={'index': 'year'}).to_dict('records')
[Out]:
[{'year': '2019', 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'year': '2020', 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'year': '2021', 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'year': '2022', 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]
CodePudding user response:
Using .pivot
records = (
df
.reset_index()
.pivot(index="year", columns="col2", values="")
.reset_index()
.to_dict(orient="records")
)
print(records)
[{'year': 2019, 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'year': 2020, 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'year': 2021, 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'year': 2022, 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]