I have this df:
data = {
'Name': ['Tom', 'nick', 'krish', 'jack'],
'A': [20, 21, 19, 18],
'B': [3, 6, 2, 1],
'C': [6, 14, 5, 17],
'D': [2, 10, 9, 98]
}
people = pd.DataFrame(data)
people["max_1"]=people[['A','B','C','D']].max(axis=1)
people
So I've added a new column - max_1 for the maximum value in each row from columns A, B, C, and D. My question is how can I create new columns (max_2 and max_3) for the 2nd highest value and for the third highest value?
Additional question - is it possible to add another condition on top of it? For example, find the maximum values but only when the names are 'Tom'/'nick'/'krish' -> otherwise, set 0 for those rows.
Thanks in advance.
CodePudding user response:
A solution with apply and nlargest.
import pandas as pd
data = {
'Name': ['Tom', 'nick', 'krish', 'jack'],
'A': [20, 21, 19, 18],
'B': [3, 6, 2, 1],
'C': [6, 14, 5, 17],
'D': [2, 10, 9, 98]
}
people = pd.DataFrame(data)
# Solution
# Set Name to index. So it does not interfere when we do things with numbers.
people = people.set_index("Name")
# Apply nlargest to each row.
# Not efficient because we us apply. But the good part that there is not much code.
top3 = people.apply(lambda x: pd.Series(x.nlargest(3).values), axis=1)
people[["N1", "N2", "N3"]] = top3
Result
A B C D N1 N2 N3
Name
Tom 20 3 6 2 20 6 3
nick 21 6 14 10 21 14 10
krish 19 2 5 9 19 9 5
jack 18 1 17 98 98 18 17
CodePudding user response:
Use:
#number of columns
N = 3
#columns names
cols = ['A','B','C','D']
#new columns names
new = [f'max_{i 1}' for i in range(N)]
#condition for test membership
mask = people['Name'].isin(['Tom','nick'])
#new columns filled 0
people[new] = 0
#for filtered rows get top N values
people.loc[mask, new] = np.sort(people.loc[mask, cols].to_numpy(), axis=1)[:, -N:][:, ::-1]
print (people)
Name A B C D max_1 max_2 max_3
0 Tom 20 3 6 2 20 6 3
1 nick 21 6 14 10 21 14 10
2 krish 19 2 5 9 0 0 0
3 jack 18 1 17 98 0 0 0
Soluton with numpy.where
and broadcasting:
N = 3
cols = ['A','B','C','D']
new = [f'max_{i 1}' for i in range(N)]
mask = people['Name'].isin(['Tom','nick'])
people[new] = np.where(mask.to_numpy()[:, None],
np.sort(people[cols].to_numpy(), axis=1)[:, -N:][:, ::-1],
0)
print (people)
Name A B C D max_1 max_2 max_3
0 Tom 20 3 6 2 20 6 3
1 nick 21 6 14 10 21 14 10
2 krish 19 2 5 9 0 0 0
3 jack 18 1 17 98 0 0 0
CodePudding user response:
You can do :
# to get max_2
people['max_2'] = [np.sort(people[['A','B','C','D']].iloc[:])[i][2] for i in range(len(people))]
# to get max_3
people['max_3'] = [np.sort(people[['A','B','C','D']].iloc[:])[i][1] for i in range(len(people))]
CodePudding user response:
n = 3
idx = [f'max_{i}' for i in range(1, 1 n)]
df = people.iloc[:, 1:].apply(lambda x: x.nlargest(n).set_axis(idx), axis=1)
people.join(df)
result:
Name A B C D max_1 max_2 max_3
0 Tom 20 3 6 2 20 6 3
1 nick 21 6 14 10 21 14 10
2 krish 19 2 5 9 19 9 5
3 jack 18 1 17 98 98 18 17
change n
to what you want
CodePudding user response:
Use
import numpy as np
people[['max_1','max_2','max_3']] = \
people[['A','B','C','D']].apply(lambda x: -np.sort(-x), axis=1, raw=True).iloc[:, 0:3]
people
# Out:
# Name A B C D max_1 max_2 max_3
# 0 Tom 20 3 3 2 20 3 3
# 1 nick 21 6 14 10 21 14 10
# 2 krish 19 2 5 9 19 9 5
# 3 jack 18 1 17 98 98 18 17
Note that I changed the data a bit to show what happens in case of duplicate values
# data = {
# 'Name': ['Tom', 'nick', 'krish', 'jack'],
# 'A': [20, 21, 19, 18],
# 'B': [3, 6, 2, 1],
# 'C': [3, 14, 5, 17],
# 'D': [2, 10, 9, 98]
# }
# people = pd.DataFrame(data)
people
# Out:
# Name A B C D
# 0 Tom 20 3 3 2
# 1 nick 21 6 14 10
# 2 krish 19 2 5 9
# 3 jack 18 1 17 98