I have the following df:
df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})
I wish to convert it to a list of dictionary such that the year would be mentioned only once in each item. Example
[{'year':2020,'europe':1,'USA':6,'africa':5,}...]
when I do:
df.set_index('year').to_dict('records')
I lost the years and the list
CodePudding user response:
Another approach that uses pivot before to_dict(orient='records')
df.pivot(
index='year',
columns='region',
values='volume'
).reset_index().to_dict(orient='records')
#Output:
#[{'year': 2020, 'USA': 6, 'africa': 5, 'europe': 1},
# {'year': 2021, 'USA': 8, 'africa': 7, 'europe': 3},
# {'year': 2022, 'USA': 3, 'africa': 5, 'europe': 6}]
CodePudding user response:
Try:
d = [
{"year": y, **dict(zip(x["region"], x["volume"]))}
for y, x in df.groupby("year")
]
print(d)
Prints:
[
{"year": 2020, "europe": 1, "USA": 6, "africa": 5},
{"year": 2021, "europe": 3, "USA": 8, "africa": 7},
{"year": 2022, "europe": 6, "USA": 3, "africa": 5},
]
CodePudding user response:
you can use groupby on year and then zip region and volume
import pandas as pd
df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})
year_dfs = df.groupby("year")
records = []
for year, year_df in year_dfs:
year_dict = {key: value for key, value in zip(year_df["region"], year_df["volume"])}
year_dict["year"] = year
records.append(year_dict)
""" Answer
[{'europe': 1, 'USA': 6, 'africa': 5, 'year': 2020},
{'europe': 3, 'USA': 8, 'africa': 7, 'year': 2021},
{'europe': 6, 'USA': 3, 'africa': 5, 'year': 2022}]
"""
CodePudding user response:
To break down each step, you could use pivot to group your df to aggregate the years, your columns become countries, and volume becomes your values
df.pivot('year','region','volume')
region USA africa europe
year
2020 6 5 1
2021 8 7 3
2022 3 5 6
To get this into dictionary format you can use the .to_dict('index') command (in one line)
x = df.pivot('year','region','volume').to_dict('index')
{2020: {'USA': 6, 'africa': 5, 'europe': 1}, 2021: {'USA': 8, 'africa': 7, 'europe': 3}, 2022: {'USA': 3, 'africa': 5, 'europe': 6}}
finally you could use list comprehension to get it into your desired format
output = [dict(x[y], **{'year':y}) for y in x]
[{'USA': 6, 'africa': 5, 'europe': 1, 'year': 2020}, {'USA': 8, 'africa': 7, 'europe': 3, 'year': 2021}, {'USA': 3, 'africa': 5, 'europe': 6, 'year': 2022}]