I have a dataset that looks like this
data <- data.frame(ID = c("1a", "1b", "2a", "2b", "3a", "4b", "5a", "5b"),
Sex = c(1, 2, 2, 1, 1, 2, 1, 2))
ID Sex
1a 1
1b 2
2a 2
2b 1
3a 1
4b 2
5a 1
5b 2
I want to filter based on ID
. Specifically, if there are same numbers in a string in ID
, such as 1a and 1b, 2a and 2b, and 5a and 5b, then I want to filter rows with Sex
= 1. Additionally, I want to keep the rows with 3a and 4b, because it does not have its counterparts of 3b and 4a, regardless of its value in Sex
.
My final desired output is:
ID Sex
1a 1
2b 1
3a 1
4b 2
5a 1
Thank you for your help!
CodePudding user response:
We may group by the numeric part of 'ID', and filter
where Sex
is 1 or (|
) when number of rows is 1
library(dplyr)
data %>%
group_by(grp = readr::parse_number(ID)) %>%
filter(Sex == 1|n() ==1) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 5 × 2
ID Sex
<chr> <dbl>
1 1a 1
2 2b 1
3 3a 1
4 4b 2
5 5a 1
CodePudding user response:
Idea:
- split ID between number and letter to check whether there are several letters for each ID first number
- group by number
- keep when there's one letter max, or when ID_2 contains both "a" and "b" and Sex == 1
library(dplyr)
library(tidyr)
data <- data.frame(ID = c("1a", "1b", "2a", "2b", "3a", "4b", "5a", "5b"),
Sex = c(1, 2, 2, 1, 1, 2, 1, 2))
data %>%
separate(ID, into = c("ID_1", "ID_2"), sep = 1) %>%
group_by(ID_1) %>%
filter(n() <= 1 | (all(c("a", "b") %in% ID_2) & Sex == 1)) %>%
ungroup() %>%
unite(col = "ID", ID_1, ID_2, sep = "")
#> # A tibble: 5 × 2
#> ID Sex
#> <chr> <dbl>
#> 1 1a 1
#> 2 2b 1
#> 3 3a 1
#> 4 4b 2
#> 5 5a 1
Created on 2022-07-11 by the reprex package (v2.0.1)
CodePudding user response:
Try this
library(dplyr)
data |> group_by(sub("\\D" , "" , ID)) |>
filter(n() == 1 | Sex == 1) |> ungroup() |>
select(ID , Sex)
- output
# A tibble: 5 × 2
ID Sex
<chr> <dbl>
1 1a 1
2 2b 1
3 3a 1
4 4b 2
5 5a 1
CodePudding user response:
Another possible solution:
library(dplyr)
data %>%
add_count(gsub("[a-z]", "", ID)) %>%
filter(Sex == 1 | n == 1) %>%
select(ID, Sex)
#> ID Sex
#> 1 1a 1
#> 2 2b 1
#> 3 3a 1
#> 4 4b 2
#> 5 5a 1
CodePudding user response:
One more similar to @akrun's solution:
library(dplyr)
data %>%
group_by(group_id = as.numeric(gsub("\\D", "", ID))) %>%
arrange(Sex, .by_group = TRUE) %>%
slice(1) %>%
ungroup() %>%
select(-group_id)
ID Sex
<chr> <dbl>
1 1a 1
2 2b 1
3 3a 1
4 4b 2
5 5a 1