Elegant way to correctly index a Dataframe after Subsetting-CodePudding

I have the following dummy dataframe with two columns:

set.seed(666)
df = data.frame(group = rep(c(-1,1), times = 5, each = 1),
                values = runif(10))

   group     values
1     -1 0.77436849
2      1 0.19722419
3     -1 0.97801384
4      1 0.20132735
5     -1 0.36124443
6      1 0.74261194
7     -1 0.97872844
8      1 0.49811371
9     -1 0.01331584
10     1 0.25994613

I want to find the row index of the maximum value from group "1" in the original dataframe. Applying which.max() after subsetting the dataframe will return the wrong row:

which.max(df[df[,1]==1,2]) #returns '3'

As a workaround I did the following:

sdf = df[df[,1] == 1,] # subset of dataframe that keeps row names
rownames(sdf[which.max(sdf[,2]),]) # returns '6'

Which does return the correct row index from the original dataframe. However I feel there must be an easier, more elegant solution but can't think of anything else myself. Any ideas?

I feel really stupid for asking this question but it seems I'm overlooking something very simple.

CodePudding user response：

Use match() to create a logical vector with 1 being the group of interest, and NA for all other groups. Use it as a filter on values,

which.max(match(df$group, 1) * df$values)

match(df$group, 1) (for any value, not just 1) returns a vector of NA or 1. Multiplying df$values returns either NA or the original value. which.max() select the maximum of the original values, ignoring (in the calculation of the maximum) the NA values.

CodePudding user response：

This is the tidyverse way:

library(tidyverse)
set.seed(666)
df = data.frame(group = rep(c(-1,1), times = 5, each = 1),
                values = runif(10))

df %>%
  mutate(row_number = row_number()) %>%
  group_by(group) %>%
  # descending sorting
  arrange(-values) %>%
  # pick first e.g. maximum
  slice(1) %>%
  select(group, row_number)
#> # A tibble: 2 x 2
#> # Groups:   group [2]
#>   group row_number
#>   <dbl>      <int>
#> 1    -1          7
#> 2     1          6

^{Created on 2022-02-08 by the reprex package (v2.0.1)}

CodePudding user response：

Here I use a base R approach, but it's also not very elegant.

which(df[, 2] == max(df[which(df[, 1] == 1), 2]))
[1] 6

Input

df
   group     values
1     -1 0.77436849
2      1 0.19722419
3     -1 0.97801384
4      1 0.20132735
5     -1 0.36124443
6      1 0.74261194
7     -1 0.97872844
8      1 0.49811371
9     -1 0.01331584
10     1 0.25994613