How can I create a column col4
that contains the 2nd largest value in each row
df = pd.DataFrame([[4, 1, 5],
[5, 2, 9],
[2, 9, 3],
[8, 5, 4]],
columns=["col_A", "col_B", "col_C"])
cols = np.array(df.columns)
df['col4'] = df.nlargest(2, columns=cols) #wrong
CodePudding user response:
You can use indexing on the output of np.sort
:
N = 2
df['col4'] = np.sort(df)[:, -N]
Alternative with apply
:
df['col4'] = df.apply(lambda r: r.nlargest(2).iloc[-1], axis=1)
output:
col_A col_B col_C col4
0 4 1 5 4
1 5 2 9 5
2 2 9 3 3
3 8 5 4 5
CodePudding user response:
For each row, you could sort the values and take the second last one as follow :
df["col4"] = df.apply(lambda x: sorted(x)[-2], axis=1)