I have a list of identical dataframes and I am trying to sum one column in each dataframe in the list. My thought is something like total = [df['A'].sum for df in dfs]
but this returns a list of length dfs containing only the value method
. My desired output is a list of the column sum for each dataframe. What is the fastest way to achieve this goal? I have to repeat this sum thousands of times per list on thousands of different lists.
CodePudding user response:
Perhaps, you are missing ()
after sum
total = [df['A'].sum() for df in dfs]
You want to call the method sum
not just reference it.
Python sum
is pretty quick: Python built-in sum function vs. for loop performance and
I assume that pandas sum should be comparable.
Difference between sum, 'sum' and np.sum *under the hood* (Python / Pandas / Numpy)