I have a pandas dataframe like this:
column1 | column2 | column3
1 | 4 | 10.4
4 | 7 | 11.1
3 | 3 | 3.3
How could I calculate the sum of the squared values for the entire column (I am trying something like deviation = df[columnName].pow(2).sum()
in a loop, but ideas are very welcome!) but also afterwards identifying the column that has the smallest of those sums and the actual smallest sum?
Edit: Adding desired output
Desired output in this case would be:
Minimum sum of squared values: 26
Column containing minimum sum of squared values: column1
CodePudding user response:
You can calculate the sum of square on the entire data frame, which returns a Series object with the column names as index. And then you can find the minimum value as well as minimum index using min
and idxmin
:
col_squares = df.pow(2).sum()
col_squares
#column1 26.00
#column2 74.00
#column3 242.26
#dtype: float64
col_squares.min(), col_squares.idxmin()
#(26.0, 'column1')
CodePudding user response:
Easier/Understandable way to get the same result is as below
def minimum_square_sum_col(col):
sums = 0
for i in col:
sums = i**2
return sums
col_to_sum_dict = dict(df.apply(minimum_square_sum_col))
print(col_to_sum_dict)