Is there any function to find out first two or n largest numbers and take an average of those two into a different column in pandas.
Note: Time or any non-numeric column to be ignored.
time |
n1 | n2 | n3 | n4 | average_largest_2 |
---|---|---|---|---|---|
11:50 | 1 | 2 | 3 |
4 |
3.5 |
12:50 | 5 | 6 | 7 |
8 |
7.5 |
13:50 | 8 |
7 |
6 | 5 | 7.5 |
Use this code if need be:
import pandas as pd
import numpy as np
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 8],
'n2': [2, 6 ,7],
'n3': [3, 7 ,6],
'n4': [4, 8, 5],
}
df1 = pd.DataFrame(data = data_1)
df1
Expected Result: Average_largest_2_column
CodePudding user response:
You can use nlargest
per row and get the mean:
df1['average_largest_2'] = (df1.select_dtypes('number')
.apply(lambda r: r.nlargest(2).mean(), axis=1)
)
Or using the underlying numpy array:
a = df1.select_dtypes('number').to_numpy()
df1['average_largest_2'] = np.sort(a)[:,-2:].mean(1)
Output:
time n1 n2 n3 n4 average_largest_2
0 11:50 1 2 3 4 3.5
1 12:50 5 6 7 8 7.5
2 13:50 8 7 6 5 7.5
CodePudding user response:
import pandas as pd
import numpy as np
def top_two(ls):
ls = sorted(ls)
return ls[-2:]
time = ['11:50', '12:50', '13:50']
df = pd.DataFrame( {'time': time,
'n1': [1, 5, 8],
'n2': [2, 6 ,7],
'n3': [3, 7 ,6],
'n4': [4, 8, 5],
})
for index, row in df.iterrows():
avg = sum(top_two([row['n1'],row['n2'],row['n3'],row['n4']]))/2
df['avg'] = avg
print(df)