Home > Software engineering >  Filtering rows within a group
Filtering rows within a group

Time:05-18

I have a dataframe like the following:

ID <- c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5)
Type <- c('A','B','C','B', 'C','A','B', 'C','A','C', 'C')
Value <- c(10, 11, 12, 1, 2, 100, 101, 102, -1, -2, -10)

df <- data.frame(ID, Type, Value)

My goal is a dataframe, which only has one row for each ID. But I would like to select the rows based on certain criteria / waterfall principle.

So type A is preferred, then type B, then type C. My goal is the following dataframe:

ID <- c(1, 2, 3, 4, 5)
Type <- c('A','B','A','A', 'C')
Value <- c(10, 1, 100, -1, -10)

df_goal <- data.frame(ID, Type, Value)

I thought aboubt pivoting into wide format and then use dplyr::coalesce, but then I would loose the Type-column. I am pretty sure there is a solution using dplyr::group_by and dplyr::summarise. Initially I also thought about using dplyr::if_else in a dplyr::filter statement, but this is not possible I think.

Any help is appreciated!

CodePudding user response:

In dplyr, use slice_min, which by default filters the minimum value by group ordered by a variable (here Type, which is sorted by alphabetical order).

library(dplyr)
df %>% 
  group_by(ID) %>% 
  slice_min(Type)

# A tibble: 5 × 3
# Groups:   ID [5]
     ID Type  Value
  <dbl> <chr> <dbl>
1     1 A        10
2     2 B         1
3     3 A       100
4     4 A        -1
5     5 C       -10

In base R:

df[with(df, Type == ave(Type, ID, FUN = min)), ]

CodePudding user response:

data.table solution

library(data.table)

setDT(df) # make it a data.table if it is not yet

setkey(df, Type) # set the key on Type
df[, .SD[1L], by = ID] # subset the first value per ID

#    ID Type Value
# 1:  1    A    10
# 2:  3    A   100
# 3:  4    A    -1
# 4:  2    B     1
# 5:  5    C   -10
  • Related