I have the following DataFrame:
df = pd.DataFrame({'a': [0.28, 0, 0.25, 0.85, 0.1],
'b': [0.5, 0.5, 0, 0.75, 0.1],
'c': [0.33, 0.7, 0.25, 0.2, 0.5],
'd': [0, 0.25, 0.2, 0.66, 0.1]})
Output:
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.50 0.70 0.25
2 0.25 0.00 0.25 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
For each column, I want to sum the top n
max values of the column, where n
is determined by how many row max values that column contains.
For example, column b
has a row max only in row 1, so its sum is the sum of the top 1 max values in that column, which is just 0.5
-- but column c
has three row-maxes, located in rows 1, 2, and 4, so the top 3 max values of column c
should be summed.
Expected output:
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.50 0.70 0.25
2 0.25 0.00 0.25 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
count 1.10 0.50 1.45 0.00
CodePudding user response:
where
df.append(
df.where( # only look at values that are max for the row
df.eq( # compare max values to all values in row just in case there are more than 1
df.max(axis=1), # actually get max values
axis=0
)
).sum().rename('count')
)
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.60 0.50 0.25
2 0.25 0.00 1.00 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
count 0.85 1.10 1.50 0.00
CodePudding user response:
The fastest way to do this would be to using the .max() method passing through the axis argument:
df.max(axis =1)
If you're after another column:
df['column_name'] = df.max(axis =1)
I didn't read the question that well!