How to add arbitrary numbers of DataFrames-CodePudding

I have the following

# df1, df2, final_df have same index
final_df = ...
df1 = ...
df2 = ...
sum_cols = ['a', 'b', 'c']
final_df[sum_cols] = df1[sum_cols]   df2[sum_cols]

Now I want to do this for arbitrary number of dfs

final_df = ...
dfs = [df1, df2, df3, ...] # they all have same index
final_df[sum_cols] = df1[sum_cols]   df2[sum_cols]   df3[sum_cols]   ...

How do I do this nicely, without a for loop?

CodePudding user response：

You can use reduce (or you can use a for loop).

import functools
import operator

sum_cols = ['a', 'b', 'c']
dataframes = [...] # list of dataframes
final_df[sum_cols] = functools.reduce(operator.add, [d[sum_cols] for d in dataframes])

Reduce has the advantage over sum() that it doesn't use an initial value (which is 0 by default for sum).

Just using a loop might also be ok and short to write. And efficient, we don't create needless other objects or intermediates.

final_df[sum_cols] = dataframes[0][sum_cols]
for d in dataframes[1:]:
    final_df[sum_cols]  = d[sum_cols]

You could even try to do this in just pandas operations. But I would not recommend this. We're just wasting time by copying data at this point:

final_df[sum_cols] = pd.concat([d[sum_cols] for d in dataframes], axis=1, keys=range(len(dataframes))).sum(level=1, axis=1)

pd.concat has an option copy=False but in practice it doesn't save copying in most cases.

The alternatives are provided so that it will become easier to be content.. with one of them, maybe just a loop? :)

CodePudding user response：

You can use the sum() function

final_df[sum_cols] = sum(dfs)

If the dfs are all numbers