Select 2nd (or nth) smallest value in column while keeping dataframe-CodePudding

I am using this code to select the smallest row in a column of a given df (got this appraoch from here):

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
min_value = data.groupby('A').B.min()
data = data.merge(min_value, on='A',suffixes=('', '_min'))
data = data[data.B==data.B_min].drop('B_min', axis=1)

I would like to modify this such that I get the 2nd (or nth) lowest value for that column.

CodePudding user response：

You can find the nth lowest B per A and filter data.

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
# sort data
data = data.sort_values(by=['A','B'])
# transform the 2nd lowest (n=1) for the row and filter
data = data[data['B'] == data.groupby('A')['B'].transform('nth', 1)]
print(data)

   A  B  C
0  1  4  3
5  2  6  6

You can select any nth by passing the rank to transform as arg.

CodePudding user response：

Try:

print(
    data.groupby("A", as_index=False).apply(
        lambda x: x.sort_values(by="B").iloc[1]
    )
)

Prints:

   A  B  C
0  1  4  3
1  2  6  6

CodePudding user response：

If your data is large, you could avoid sorting the data(which can be expensive), and instead use a combination of idxmin ( as shown in the solution you referenced) and nsmallest:

grouper = data.groupby('A').B
# get the minimum
minimum = grouper.idxmin()
# get the nsmallest rows (2 in this case)
smallest_2 = grouper.nsmallest(2).index.droplevel(0)
# alternative is smallest_2.difference(minimum)
smallest_2 = smallest_2[~smallest_2.isin(minimum)]
data.loc[smallest_2]

   A  B  C
0  1  4  3
5  2  6  6