I have the following
# df1, df2, final_df have same index
final_df = ...
df1 = ...
df2 = ...
sum_cols = ['a', 'b', 'c']
final_df[sum_cols] = df1[sum_cols] df2[sum_cols]
Now I want to do this for arbitrary number of dfs
final_df = ...
dfs = [df1, df2, df3, ...] # they all have same index
final_df[sum_cols] = df1[sum_cols] df2[sum_cols] df3[sum_cols] ...
How do I do this nicely, without a for loop?
CodePudding user response:
You can use reduce (or you can use a for loop).
import functools
import operator
sum_cols = ['a', 'b', 'c']
dataframes = [...] # list of dataframes
final_df[sum_cols] = functools.reduce(operator.add, [d[sum_cols] for d in dataframes])
Reduce has the advantage over sum()
that it doesn't use an initial value (which is 0
by default for sum).
Just using a loop might also be ok and short to write. And efficient, we don't create needless other objects or intermediates.
final_df[sum_cols] = dataframes[0][sum_cols]
for d in dataframes[1:]:
final_df[sum_cols] = d[sum_cols]
You could even try to do this in just pandas operations. But I would not recommend this. We're just wasting time by copying data at this point:
final_df[sum_cols] = pd.concat([d[sum_cols] for d in dataframes], axis=1, keys=range(len(dataframes))).sum(level=1, axis=1)
pd.concat
has an option copy=False
but in practice it doesn't save copying in most cases.
The alternatives are provided so that it will become easier to be content.. with one of them, maybe just a loop? :)
CodePudding user response:
You can use the sum()
function
final_df[sum_cols] = sum(dfs)
If the dfs are all numbers