get difference between the two minimum values rowwise pandas columns-CodePudding

I have a dataframe as a result of various pivot operations with float numbers (this example using integers for simplicity)

    import numpy as np
    import pandas as pd
    
    np.random.seed(365)
    rows = 10
    cols= {'col_a': [np.random.randint(100) for _ in range(rows)],
           'col_b': [np.random.randint(100) for _ in range(rows)],
           'col_c': [np.random.randint(100) for _ in range(rows)]}
    data = pd.DataFrame(cols)


data
    col_a   col_b   col_c
0   82        36    43
1   52        48    12
2   33        28    77
3   91        99    11
4   44        95    27
5   5         94    64
6   98         3    88
7   73        39    92
8   26        39    62
9   56        74    50

I want to detect the two minimum values in a row and get the diff in a new column. For example, in first row, the 2 minimum values are 36 and 43, so the difference will be 7

I've tried this way:

data['difference']=data[data.apply(lambda x: x.nsmallest(2).astype(float), axis=1).isna()].subtract(axis=1)

but i get:

TypeError: f() missing 1 required positional argument: 'other'

CodePudding user response：

Better use numpy:

a = np.sort(data)
data['difference'] = a[:,1]-a[:,0]

output:

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

CodePudding user response：

Follow your idea with nsmallest on rows

data['difference'] = data.apply(lambda x: x.nsmallest(2).tolist(), axis=1, result_type='expand').diff(axis=1)[1]
# or
data['difference'] = data.apply(lambda x: x.nsmallest(2).diff().iloc[-1], axis=1)

print(data)

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

CodePudding user response：

Here is a way using rank()

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values()
    .groupby(level=0)
    .agg(lambda x: x.diff().sum()))

If your df was larger and you wanted to potentially use more than the 2 smallest, this should work

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values(ascending=False)
    .groupby(level=0)
    .agg(lambda x: x.mul(-1).cumsum().add(x.max()*2).iloc[-1]))