Home > Software design >  How to identify minimum squared value of an entire pandas dataframe column by column?
How to identify minimum squared value of an entire pandas dataframe column by column?

Time:12-25

I have a pandas dataframe like this:

column1 | column2  | column3
1       | 4        |   10.4  
4       | 7        |   11.1
3       | 3        |   3.3

How could I calculate the sum of the squared values for the entire column (I am trying something like deviation = df[columnName].pow(2).sum() in a loop, but ideas are very welcome!) but also afterwards identifying the column that has the smallest of those sums and the actual smallest sum?

Edit: Adding desired output

Desired output in this case would be:

Minimum sum of squared values: 26
Column containing minimum sum of squared values: column1

CodePudding user response:

You can calculate the sum of square on the entire data frame, which returns a Series object with the column names as index. And then you can find the minimum value as well as minimum index using min and idxmin:

col_squares = df.pow(2).sum()

col_squares
#column1     26.00
#column2     74.00
#column3    242.26
#dtype: float64

col_squares.min(), col_squares.idxmin()
#(26.0, 'column1')

CodePudding user response:

Easier/Understandable way to get the same result is as below

def minimum_square_sum_col(col):
    sums = 0
    for i in col:
        sums  = i**2
    return sums
col_to_sum_dict = dict(df.apply(minimum_square_sum_col))
print(col_to_sum_dict)
  • Related