I have a dataset that spans over several years.
# Import full dataset
df_all = pd.read_csv('https://raw.githubusercontent.com/dssgPT/Plotting-Good-DSSG/main/desafios/006_Incendios/area_ardida.csv')
# Assign years variable to unique values in years column
years = df_all['year'].unique()
# Print years variable
print(years)
array([1999, 2000, 1997, 1992, 1995, 1998, 1980, 1996, 1982, 1987, 1989,
1991, 1981, 1983, 1984, 1986, 1988, 1990, 1993, 1994, 1985, 2005,
2006, 2007, 2010, 2011, 2012, 2001, 2002, 2003, 2004, 2008, 2009,
2013, 2014, 2017, 2019, 2021, 2015, 2018, 2020, 2016])
My objective is to create different dataframes for each year, and then do some operations. I could do it manually as in this example below
df_2017 = df_all[df_all['year']==2017]
df_totals_2017 = df_2017.groupby(["concelho", "year"]).sum()
df_totals_2017.to_csv('/content/drive/MyDrive/2022/DATAVIZ/dssg_2017.csv')
but I'm wondering if there is a optimal way of doing this. Any help would be much appreciated.
CodePudding user response:
This is a way to do it, not sure if it is most efficient but hey it will do the job
for i in years:
grouped_df = df_all[df_all.eq(i)].groupby(["concelho"]).sum()
#Little redundant to groupby year
grouped_df.to_csv(f'/content/drive/MyDrive/2022/DATAVIZ/dssg_{i}.csv')
CodePudding user response:
Taking your code and restructuring it slightly to loop over all the year
s and construct a dictionary of split dataframes
many_dfs = {y: df_all[df_all['year']==y] for y in df_all['year'].unique()}
On further exploration of this dictionary
>>> many_dfs.keys()
dict_keys([1999, 2000, 1997, 1992, 1995, 1998, 1980, 1996, 1982, 1987, 1989, 1991, 1981, 1983, 1984, 1986, 1988, 1990, 1993, 1994, 1985, 2005, 2006, 2007, 2010, 2011, 2012, 2001, 2002, 2003, 2004, 2008, 2009, 2013,2014, 2017, 2019, 2021, 2015, 2018, 2020, 2016])
>>> type(many_dfs[2017])
<class 'pandas.core.frame.DataFrame'>
>>> many_dfs[2017].columns
Index(['Distrito', 'concelho', 'month', 'year', 'sum', 'longitude',
'latitude'],
dtype='object')