I have a dataframe as a result of various pivot operations with float numbers (this example using integers for simplicity)
import numpy as np
import pandas as pd
np.random.seed(365)
rows = 10
cols= {'col_a': [np.random.randint(100) for _ in range(rows)],
'col_b': [np.random.randint(100) for _ in range(rows)],
'col_c': [np.random.randint(100) for _ in range(rows)]}
data = pd.DataFrame(cols)
data
col_a col_b col_c
0 82 36 43
1 52 48 12
2 33 28 77
3 91 99 11
4 44 95 27
5 5 94 64
6 98 3 88
7 73 39 92
8 26 39 62
9 56 74 50
I want to detect the two minimum values in a row and get the diff in a new column. For example, in first row, the 2 minimum values are 36 and 43, so the difference will be 7
I've tried this way:
data['difference']=data[data.apply(lambda x: x.nsmallest(2).astype(float), axis=1).isna()].subtract(axis=1)
but i get:
TypeError: f() missing 1 required positional argument: 'other'
CodePudding user response:
Better use numpy:
a = np.sort(data)
data['difference'] = a[:,1]-a[:,0]
output:
col_a col_b col_c difference
0 82 36 43 7
1 52 48 12 36
2 33 28 77 5
3 91 99 11 80
4 44 95 27 17
5 5 94 64 59
6 98 3 88 85
7 73 39 92 34
8 26 39 62 13
9 56 74 50 6
CodePudding user response:
Follow your idea with nsmallest
on rows
data['difference'] = data.apply(lambda x: x.nsmallest(2).tolist(), axis=1, result_type='expand').diff(axis=1)[1]
# or
data['difference'] = data.apply(lambda x: x.nsmallest(2).diff().iloc[-1], axis=1)
print(data)
col_a col_b col_c difference
0 82 36 43 7
1 52 48 12 36
2 33 28 77 5
3 91 99 11 80
4 44 95 27 17
5 5 94 64 59
6 98 3 88 85
7 73 39 92 34
8 26 39 62 13
9 56 74 50 6
CodePudding user response:
Here is a way using rank()
(df.where(
df.rank(axis=1,method = 'first')
.le(2))
.stack()
.sort_values()
.groupby(level=0)
.agg(lambda x: x.diff().sum()))
If your df was larger and you wanted to potentially use more than the 2 smallest, this should work
(df.where(
df.rank(axis=1,method = 'first')
.le(2))
.stack()
.sort_values(ascending=False)
.groupby(level=0)
.agg(lambda x: x.mul(-1).cumsum().add(x.max()*2).iloc[-1]))