Home > Mobile >  Pandas : How to merge multiple data frame by using for loop?
Pandas : How to merge multiple data frame by using for loop?

Time:10-06

I have more than 20 data frame in my current database. And I want to concatenate all these data frame. However, my current code is too bad. First I generate dictionary and Based on this dict, I could generate 20 data frame. Here is my code

import copy

dict_of_df = {}

file_dir = '../VentureXpert/Raw/county_csv/'

for year in range(1996,2020):
    key_name = 'df_' str(year)
    df = pd.read_csv(file_dir   str(year) '.csv',header=None)
    df.columns =['st-county' ,'numb of comp','pct of comp','inv sum($mil)','pct of inv','avg per comp','med per comp','year']
    dict_of_df[key_name] = copy.deepcopy(df)

Here is how I generate 20 dataframe

df_1996 = dict_of_df['df_1996']
df_1997 = dict_of_df['df_1997']
df_1998 = dict_of_df['df_1998']
df_1999 = dict_of_df['df_1999']

df_2000 = dict_of_df['df_2000']
df_2001 = dict_of_df['df_2001']
df_2002 = dict_of_df['df_2002']
df_2003 = dict_of_df['df_2003']
df_2004 = dict_of_df['df_2004']
df_2005 = dict_of_df['df_2005']
df_2006 = dict_of_df['df_2006']
df_2007 = dict_of_df['df_2007']
df_2008 = dict_of_df['df_2008']
df_2009 = dict_of_df['df_2009']
df_2010 = dict_of_df['df_2010']

df_2011 = dict_of_df['df_2011']
df_2012 = dict_of_df['df_2012']
df_2013 = dict_of_df['df_2013']
df_2014 = dict_of_df['df_2014']
df_2015 = dict_of_df['df_2015']
df_2016 = dict_of_df['df_2016']
df_2017 = dict_of_df['df_2017']
df_2018 = dict_of_df['df_2018']
df_2019 = dict_of_df['df_2019']

and I did concat method to merge these 20 dataframe.

df_final = pd.concat([df_1996,df_1997,df_1998,df_1999,df_2000,df_2001,df_2002,df_2003,df_2004,df_2005,df_2006,df_2007,df_2008,df_2009,df_2010,df_2011,df_2012,df_2013,df_2014,df_2015,df_2016,df_2017,df_2018,df_2019], ignore_index=True)

Is there any other easy way to do this?

I want to use for loop do this.

Thanks in advance

CodePudding user response:

Problem is with string variables df_1996,df_1997,df_1998,df_1999,df_2000, in python are not recommended.

If generate dict of DataFrames dict_of_df then solution is simplify a lot:

df_final = pd.concat(dict_of_df.values(), ignore_index=True)

If need also generate new column by keys:

df_final = pd.concat(dict_of_df).reset_index(level=1, drop=True).reset_index()
  • Related