Home > Blockchain >  R dplyr slice min within a subset of dataset
R dplyr slice min within a subset of dataset

Time:04-05

If this is my dataset

Id    Group1   Col1       Col2
1     Red      2/1/1999   3
1     Red      4/11/1998  4
2     Black    7/8/1995   NA
2     Black    11/2/2000  1
3     Black    11/1/2994  2
3     Black    5/18/1997  6

How can I implement slice_min OR filter(Col2= min(Col2)|is.na(Col2)) to retain only rows with minimum of Col2 only where Group1=Red ?

Expected output

Id    Group1   Col1       Col2
1     Red      2/1/1999   3

2     Black    7/8/1995   NA
2     Black    11/2/2000  1
3     Black    11/1/2994  2
3     Black    5/18/1997  6

Thanks in advance for any suggestions.

CodePudding user response:

You can use slice_min only on the subset you want, and rbind the result to the rest of the df

rbind(
  df %>% filter(Group1 == "Red") %>% slice_min(Col2),
  df %>% filter(Group1 != "Red")
)

Output:

     Id Group1 Col1       Col2
  <int> <chr>  <chr>     <int>
1     1 Red    2/1/1999      3
2     2 Black  7/8/1995     NA
3     2 Black  11/2/2000     1
4     3 Black  11/1/2994     2
5     3 Black  5/18/1997     6

CodePudding user response:

You could use a slightly more complex filter condition:

library(dplyr)

df %>% 
  filter((Group1 == "Red" & (Col2 == min(.[.$Group1 == "Red",]$Col2) | is.na(Col2))) | Group1 != "Red")

This returns

# A tibble: 5 x 4
     Id Group1 Col1       Col2
  <dbl> <chr>  <chr>     <dbl>
1     1 Red    2/1/1999      3
2     2 Black  7/8/1995     NA
3     2 Black  11/2/2000     1
4     3 Black  11/1/2994     2
5     3 Black  5/18/1997     6

Or a more elegant way as proposed by Ritchie Sacramento:

df %>%  
  group_by(Group1) %>%  
  filter(Col2 == min(Col2, na.rm = TRUE) | is.na(Col2) | Group1 != "Red")

Data

structure(list(Id = c(1, 1, 2, 2, 3, 3), Group1 = c("Red", "Red", 
"Black", "Black", "Black", "Black"), Col1 = c("2/1/1999", "4/11/1998", 
"7/8/1995", "11/2/2000", "11/1/2994", "5/18/1997"), Col2 = c(3, 
4, NA, 1, 2, 6)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
    Id = structure(list(), class = c("collector_double", "collector"
    )), Group1 = structure(list(), class = c("collector_character", 
    "collector")), Col1 = structure(list(), class = c("collector_character", 
    "collector")), Col2 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))
  • Related