Home > OS >  dataframe to list of dictionary
dataframe to list of dictionary

Time:06-30

I have the following df:

df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})

enter image description here

I wish to convert it to a list of dictionary such that the year would be mentioned only once in each item. Example

[{'year':2020,'europe':1,'USA':6,'africa':5,}...]

when I do:

df.set_index('year').to_dict('records')

I lost the years and the list

CodePudding user response:

Another approach that uses pivot before to_dict(orient='records')

df.pivot(
    index='year',
    columns='region',
    values='volume'
).reset_index().to_dict(orient='records')

#Output:
#[{'year': 2020, 'USA': 6, 'africa': 5, 'europe': 1},
# {'year': 2021, 'USA': 8, 'africa': 7, 'europe': 3},
# {'year': 2022, 'USA': 3, 'africa': 5, 'europe': 6}]

CodePudding user response:

Try:

d = [
    {"year": y, **dict(zip(x["region"], x["volume"]))}
    for y, x in df.groupby("year")
]

print(d)

Prints:

[
    {"year": 2020, "europe": 1, "USA": 6, "africa": 5},
    {"year": 2021, "europe": 3, "USA": 8, "africa": 7},
    {"year": 2022, "europe": 6, "USA": 3, "africa": 5},
]

CodePudding user response:

you can use groupby on year and then zip region and volume

import pandas as pd

df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})

year_dfs = df.groupby("year")
records = []
for year, year_df in year_dfs:
    year_dict = {key: value for key, value in zip(year_df["region"], year_df["volume"])}
    year_dict["year"] = year
    records.append(year_dict)
""" Answer
[{'europe': 1, 'USA': 6, 'africa': 5, 'year': 2020},
 {'europe': 3, 'USA': 8, 'africa': 7, 'year': 2021},
 {'europe': 6, 'USA': 3, 'africa': 5, 'year': 2022}]
"""

CodePudding user response:

To break down each step, you could use pivot to group your df to aggregate the years, your columns become countries, and volume becomes your values

df.pivot('year','region','volume')

region  USA  africa  europe
year                       
2020      6       5       1
2021      8       7       3
2022      3       5       6

To get this into dictionary format you can use the .to_dict('index') command (in one line)

x = df.pivot('year','region','volume').to_dict('index')

{2020: {'USA': 6, 'africa': 5, 'europe': 1}, 2021: {'USA': 8, 'africa': 7, 'europe': 3}, 2022: {'USA': 3, 'africa': 5, 'europe': 6}}

finally you could use list comprehension to get it into your desired format

output = [dict(x[y], **{'year':y}) for y in x]
[{'USA': 6, 'africa': 5, 'europe': 1, 'year': 2020}, {'USA': 8, 'africa': 7, 'europe': 3, 'year': 2021}, {'USA': 3, 'africa': 5, 'europe': 6, 'year': 2022}]
  • Related