Home > Software design >  Create Dataframes based on array values
Create Dataframes based on array values

Time:07-07

I have a dataset that spans over several years.

# Import full dataset 
df_all = pd.read_csv('https://raw.githubusercontent.com/dssgPT/Plotting-Good-DSSG/main/desafios/006_Incendios/area_ardida.csv')
# Assign years variable to unique values in years column
years = df_all['year'].unique()
# Print years variable
print(years)

array([1999, 2000, 1997, 1992, 1995, 1998, 1980, 1996, 1982, 1987, 1989,
       1991, 1981, 1983, 1984, 1986, 1988, 1990, 1993, 1994, 1985, 2005,
       2006, 2007, 2010, 2011, 2012, 2001, 2002, 2003, 2004, 2008, 2009,
       2013, 2014, 2017, 2019, 2021, 2015, 2018, 2020, 2016])

My objective is to create different dataframes for each year, and then do some operations. I could do it manually as in this example below

df_2017 = df_all[df_all['year']==2017]
df_totals_2017 = df_2017.groupby(["concelho", "year"]).sum()
df_totals_2017.to_csv('/content/drive/MyDrive/2022/DATAVIZ/dssg_2017.csv')

but I'm wondering if there is a optimal way of doing this. Any help would be much appreciated.

CodePudding user response:

This is a way to do it, not sure if it is most efficient but hey it will do the job

for i in years:
    grouped_df = df_all[df_all.eq(i)].groupby(["concelho"]).sum()
    #Little redundant to groupby year
    grouped_df.to_csv(f'/content/drive/MyDrive/2022/DATAVIZ/dssg_{i}.csv')

CodePudding user response:

Taking your code and restructuring it slightly to loop over all the years and construct a dictionary of split dataframes

many_dfs = {y: df_all[df_all['year']==y] for y in df_all['year'].unique()}

On further exploration of this dictionary

>>> many_dfs.keys()
dict_keys([1999, 2000, 1997, 1992, 1995, 1998, 1980, 1996, 1982, 1987, 1989, 1991, 1981, 1983, 1984, 1986, 1988, 1990, 1993, 1994, 1985, 2005, 2006, 2007, 2010, 2011, 2012, 2001, 2002, 2003, 2004, 2008, 2009, 2013,2014, 2017, 2019, 2021, 2015, 2018, 2020, 2016])
>>> type(many_dfs[2017])
<class 'pandas.core.frame.DataFrame'>
>>> many_dfs[2017].columns
Index(['Distrito', 'concelho', 'month', 'year', 'sum', 'longitude',
   'latitude'],
  dtype='object')
  • Related