I have the following dummy dataframe with two columns:
set.seed(666)
df = data.frame(group = rep(c(-1,1), times = 5, each = 1),
values = runif(10))
group values
1 -1 0.77436849
2 1 0.19722419
3 -1 0.97801384
4 1 0.20132735
5 -1 0.36124443
6 1 0.74261194
7 -1 0.97872844
8 1 0.49811371
9 -1 0.01331584
10 1 0.25994613
I want to find the row index of the maximum value from group "1" in the original dataframe. Applying which.max()
after subsetting the dataframe will return the wrong row:
which.max(df[df[,1]==1,2]) #returns '3'
As a workaround I did the following:
sdf = df[df[,1] == 1,] # subset of dataframe that keeps row names
rownames(sdf[which.max(sdf[,2]),]) # returns '6'
Which does return the correct row index from the original dataframe. However I feel there must be an easier, more elegant solution but can't think of anything else myself. Any ideas?
I feel really stupid for asking this question but it seems I'm overlooking something very simple.
CodePudding user response:
Use match()
to create a logical vector with 1
being the group of interest, and NA
for all other groups. Use it as a filter on values,
which.max(match(df$group, 1) * df$values)
match(df$group, 1)
(for any value, not just 1
) returns a vector of NA or 1. Multiplying df$values
returns either NA or the original value. which.max()
select the maximum of the original values, ignoring (in the calculation of the maximum) the NA values.
CodePudding user response:
This is the tidyverse way:
library(tidyverse)
set.seed(666)
df = data.frame(group = rep(c(-1,1), times = 5, each = 1),
values = runif(10))
df %>%
mutate(row_number = row_number()) %>%
group_by(group) %>%
# descending sorting
arrange(-values) %>%
# pick first e.g. maximum
slice(1) %>%
select(group, row_number)
#> # A tibble: 2 x 2
#> # Groups: group [2]
#> group row_number
#> <dbl> <int>
#> 1 -1 7
#> 2 1 6
Created on 2022-02-08 by the reprex package (v2.0.1)
CodePudding user response:
Here I use a base R approach, but it's also not very elegant.
which(df[, 2] == max(df[which(df[, 1] == 1), 2]))
[1] 6
Input
df
group values
1 -1 0.77436849
2 1 0.19722419
3 -1 0.97801384
4 1 0.20132735
5 -1 0.36124443
6 1 0.74261194
7 -1 0.97872844
8 1 0.49811371
9 -1 0.01331584
10 1 0.25994613