Home > Net >  Filter data but keep at least one row for each ID
Filter data but keep at least one row for each ID

Time:04-21

In this example data, I want filter out those rows where col1 and col2 are both zeros.

df <- data.frame(
  id = rep(letters[1:3], each = 2),
  col1 = c(1, 1, 0, 0, 0, 0),
  col2 = c(0, 1, 0, 0, 1, 0),
  col3 = LETTERS[1:6]
)

This is what I tried:

library(dplyr)

df %>%
  filter(col1 == 1 | col2 == 1)

#   id col1 col2 col3
# 1  a    1    0    A
# 2  a    1    1    B
# 3  c    0    1    E

It works fine but all rows where id is "b" are removed. Are there any ways to keep at least one row for each id? My expected output is:

#   id col1 col2 col3
# 1  a    1    0    A
# 2  a    1    1    B
# 3  b   NA   NA    -
# 4  c    0    1    E

CodePudding user response:

You could try grouping:

library(dplyr)

df %>% 
  group_by(id) %>% 
  filter(col1 == 1 | col2 == 1 | row_number() == 1) %>% 
  ungroup()

This returns

# A tibble: 4 x 4
  id     col1  col2 col3 
  <chr> <dbl> <dbl> <chr>
1 a         1     0 A    
2 a         1     1 B    
3 b         0     0 C    
4 c         0     1 E  

CodePudding user response:

You can use tidyr::complete():

df %>%
  filter(col1 == 1 | col2 == 1) %>%
  tidyr::complete(id = df$id, fill = list(col3 = "-"))

# # A tibble: 4 × 4
#   id     col1  col2 col3 
#   <chr> <dbl> <dbl> <chr>
# 1 a         1     0 A    
# 2 a         1     1 B    
# 3 b        NA    NA -    
# 4 c         0     1 E   
  • Related