I have a dataframe like the following:
ID <- c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5)
Type <- c('A','B','C','B', 'C','A','B', 'C','A','C', 'C')
Value <- c(10, 11, 12, 1, 2, 100, 101, 102, -1, -2, -10)
df <- data.frame(ID, Type, Value)
My goal is a dataframe, which only has one row for each ID. But I would like to select the rows based on certain criteria / waterfall principle.
So type A is preferred, then type B, then type C. My goal is the following dataframe:
ID <- c(1, 2, 3, 4, 5)
Type <- c('A','B','A','A', 'C')
Value <- c(10, 1, 100, -1, -10)
df_goal <- data.frame(ID, Type, Value)
I thought aboubt pivoting into wide format and then use dplyr::coalesce
, but then I would loose the Type-column. I am pretty sure there is a solution using dplyr::group_by
and dplyr::summarise
. Initially I also thought about using dplyr::if_else
in a dplyr::filter
statement, but this is not possible I think.
Any help is appreciated!
CodePudding user response:
In dplyr
, use slice_min
, which by default filters the minimum value by group ordered by a variable (here Type
, which is sorted by alphabetical order).
library(dplyr)
df %>%
group_by(ID) %>%
slice_min(Type)
# A tibble: 5 × 3
# Groups: ID [5]
ID Type Value
<dbl> <chr> <dbl>
1 1 A 10
2 2 B 1
3 3 A 100
4 4 A -1
5 5 C -10
In base R:
df[with(df, Type == ave(Type, ID, FUN = min)), ]
CodePudding user response:
data.table solution
library(data.table)
setDT(df) # make it a data.table if it is not yet
setkey(df, Type) # set the key on Type
df[, .SD[1L], by = ID] # subset the first value per ID
# ID Type Value
# 1: 1 A 10
# 2: 3 A 100
# 3: 4 A -1
# 4: 2 B 1
# 5: 5 C -10