Home > Back-end >  Pandas series to list of dictionaries with specific structure
Pandas series to list of dictionaries with specific structure

Time:10-12

I have a pandas series structure like this:

year  col2
2019  FIELD_1     1210
      FIELD_2       57
      FIELD_3      157
2020  FIELD_1     2228
      FIELD_2      371
      FIELD_3      252
2021  FIELD_1     1505
      FIELD_2      138
      FIELD_3      133
2022  FIELD_1      836
      FIELD_2      166
      FIELD_3       75

dtype: int64

I obtain it with groupby from other dataframe: other_dataframe.groupby(['year', 'col2']).size()

I want to obtain a dictionary with this structure:

[
   {'year': 2019, 'FIELD_1': 1210, 'FIELD_2': 57,'FIELD_3': 157},
   {'year': 2020, 'FIELD_1': 2228, 'FIELD_2': 371,'FIELD_3': 252},
   {'year': 2021, 'FIELD_1': 1505, 'FIELD_2': 138,'FIELD_3': 133},
   {'year': 2022, 'FIELD_1': 836, 'FIELD_2': 166,'FIELD_3': 75},
]

How can I achive this?

CodePudding user response:

Considering that the multi-index series is assigned to the variable s and looks like the following

s = pd.Series([1210, 57, 157, 2228, 371, 252, 1505, 138, 133, 836, 166, 75], index=pd.MultiIndex.from_tuples([('2019', 'FIELD_1'), ('2019', 'FIELD_2'), ('2019', 'FIELD_3'), ('2020', 'FIELD_1'), ('2020', 'FIELD_2'), ('2020', 'FIELD_3'), ('2021', 'FIELD_1'), ('2021', 'FIELD_2'), ('2021', 'FIELD_3'), ('2022', 'FIELD_1'), ('2022', 'FIELD_2'), ('2022', 'FIELD_3')]))

[Out]:
2019  FIELD_1    1210
      FIELD_2      57
      FIELD_3     157
2020  FIELD_1    2228
      FIELD_2     371
      FIELD_3     252
2021  FIELD_1    1505
      FIELD_2     138
      FIELD_3     133
2022  FIELD_1     836
      FIELD_2     166
      FIELD_3      75

One way to do it is as follows (if one doesn't want to get the thought process, go directly to the end of this answer).

Will have to unstack and convert into a list of dictionaries (the list of dictionaries will be the desired output format)

d = s.unstack().to_dict('records')

[Out]:

[{'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]

However, as one can see from the previous output, the year is missing. In order to get the year, one will have to reset the index. So, instead of the previous operation, one will have to do the following

d = s.unstack().reset_index().to_dict('records')

[Out]:

[{'index': '2019', 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'index': '2020', 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'index': '2021', 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'index': '2022', 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]

Finally, as one doesn't want the name to be index, but year, one has to rename that column. So, the operation that will take care of every OP's goal is as follows

d = s.unstack().reset_index().rename(columns={'index': 'year'}).to_dict('records')

[Out]:

[{'year': '2019', 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'year': '2020', 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'year': '2021', 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'year': '2022', 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]

CodePudding user response:

Using .pivot

records = (
    df
    .reset_index()
    .pivot(index="year", columns="col2", values="")
    .reset_index()
    .to_dict(orient="records")
)

print(records)

[{'year': 2019, 'FIELD_1': 1210, 'FIELD_2': 57, 'FIELD_3': 157}, {'year': 2020, 'FIELD_1': 2228, 'FIELD_2': 371, 'FIELD_3': 252}, {'year': 2021, 'FIELD_1': 1505, 'FIELD_2': 138, 'FIELD_3': 133}, {'year': 2022, 'FIELD_1': 836, 'FIELD_2': 166, 'FIELD_3': 75}]
  • Related