Home > database >  filter a column based on values of another column
filter a column based on values of another column

Time:07-12

I have a dataset that looks like this

data <- data.frame(ID = c("1a", "1b", "2a", "2b", "3a", "4b", "5a", "5b"),
           Sex = c(1, 2, 2, 1, 1, 2, 1, 2))
 ID Sex
 1a   1
 1b   2
 2a   2
 2b   1
 3a   1
 4b   2
 5a   1
 5b   2

I want to filter based on ID. Specifically, if there are same numbers in a string in ID, such as 1a and 1b, 2a and 2b, and 5a and 5b, then I want to filter rows with Sex = 1. Additionally, I want to keep the rows with 3a and 4b, because it does not have its counterparts of 3b and 4a, regardless of its value in Sex.

My final desired output is:

 ID Sex
 1a   1
 2b   1
 3a   1
 4b   2
 5a   1

Thank you for your help!

CodePudding user response:

We may group by the numeric part of 'ID', and filter where Sex is 1 or (|) when number of rows is 1

library(dplyr)
data %>% 
   group_by(grp = readr::parse_number(ID)) %>%
   filter(Sex == 1|n() ==1) %>%
   ungroup %>% 
   select(-grp)

-output

# A tibble: 5 × 2
  ID      Sex
  <chr> <dbl>
1 1a        1
2 2b        1
3 3a        1
4 4b        2
5 5a        1

CodePudding user response:

Idea:

  • split ID between number and letter to check whether there are several letters for each ID first number
  • group by number
  • keep when there's one letter max, or when ID_2 contains both "a" and "b" and Sex == 1
library(dplyr)
library(tidyr)

data <- data.frame(ID = c("1a", "1b", "2a", "2b", "3a", "4b", "5a", "5b"),
                   Sex = c(1, 2, 2, 1, 1, 2, 1, 2))

data %>% 
  separate(ID, into = c("ID_1", "ID_2"), sep = 1) %>% 
  group_by(ID_1) %>% 
  filter(n() <= 1 | (all(c("a", "b") %in% ID_2) & Sex == 1)) %>% 
  ungroup() %>% 
  unite(col = "ID", ID_1, ID_2, sep = "")
#> # A tibble: 5 × 2
#>   ID      Sex
#>   <chr> <dbl>
#> 1 1a        1
#> 2 2b        1
#> 3 3a        1
#> 4 4b        2
#> 5 5a        1

Created on 2022-07-11 by the reprex package (v2.0.1)

CodePudding user response:

Try this

library(dplyr)

data |> group_by(sub("\\D" , "" , ID)) |>
filter(n() == 1 | Sex == 1) |> ungroup() |>
select(ID , Sex)
  • output
# A tibble: 5 × 2
  ID      Sex
  <chr> <dbl>
1 1a        1
2 2b        1
3 3a        1
4 4b        2
5 5a        1

CodePudding user response:

Another possible solution:

library(dplyr)

data %>% 
  add_count(gsub("[a-z]", "", ID)) %>% 
  filter(Sex == 1 | n == 1) %>% 
  select(ID, Sex)

#>   ID Sex
#> 1 1a   1
#> 2 2b   1
#> 3 3a   1
#> 4 4b   2
#> 5 5a   1

CodePudding user response:

One more similar to @akrun's solution:

library(dplyr)

data %>% 
  group_by(group_id = as.numeric(gsub("\\D", "", ID))) %>% 
  arrange(Sex, .by_group = TRUE) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(-group_id)
  ID      Sex
  <chr> <dbl>
1 1a        1
2 2b        1
3 3a        1
4 4b        2
5 5a        1
  • Related