Code | count | AA | BB | CC |
---|---|---|---|---|
101 | 1 | No | NO | 4 |
101 | 2 | Yes | NO | 5 |
101 | 3 | Yes | NO | 10 |
102 | 1 | Yes | NO | 7 |
102 | 2 | Yes | NO | 40 |
102 | 3 | Yes | NO | 6 |
102 | 4 | No | NO | 12 |
I want to apply the condition as,
- If the
count
column is 1 with respect tocode
column thenAA
should be"NO"
andBB
should be"NO"
. - For
count
between the max and mincount
with respect tocode
column thenAA
can be"NO"
or"YES"
andBB
should be"NO"
. - For the max
count
column with respect tocode
column thenAA
should be"NO"
andBB
should be"NO"
.
Code | count | AA | BB | CC |
---|---|---|---|---|
101 | 1 | No | NO | 4 |
101 | 2 | Yes | NO | 5 |
102 | 2 | Yes | NO | 40 |
102 | 3 | Yes | NO | 6 |
102 | 4 | No | NO | 12 |
Hi,@Darren Tsai Whatever might be the case if the count column is 1 then it is getting deleted completely, by using you code I am getting the below output
Code | count | AA | BB | CC |
---|---|---|---|---|
101 | 2 | Yes | NO | 5 |
102 | 2 | Yes | NO | 40 |
102 | 3 | Yes | NO | 6 |
102 | 4 | No | NO | 12 |
CodePudding user response:
A dplyr
solution:
library(dplyr)
df %>%
group_by(Code) %>%
mutate(flag = count %in% range(count)) %>%
filter(flag & if_all(c(AA, BB), ~ toupper(.x) == 'NO') | !flag & toupper(BB) == 'NO') %>%
ungroup() %>%
select(-flag)
# # A tibble: 5 × 5
# Code count AA BB CC
# <int> <int> <chr> <chr> <int>
# 1 101 1 No NO 4
# 2 101 2 Yes NO 5
# 3 102 2 Yes NO 40
# 4 102 3 Yes NO 6
# 5 102 4 No NO 12
A base
equivalent:
df |>
transform(flag = ave(count, Code, FUN = \(x) x %in% range(x))) |>
subset(flag & toupper(AA) == 'NO' & toupper(BB) == 'NO' | !flag & toupper(BB) == 'NO', -flag)
Data
df <- structure(list(Code = c(101L, 101L, 101L, 102L, 102L, 102L, 102L),
count = c(1L, 2L, 3L, 1L, 2L, 3L, 4L), AA = c("No", "Yes",
"Yes", "Yes", "Yes", "Yes", "No"), BB = c("NO", "NO", "NO", "NO",
"NO", "NO", "NO"), CC = c(4L, 5L, 10L, 7L, 40L, 6L, 12L)), class = "data.frame", row.names = c(NA,-7L))
Update with another dataset
This dataset has 12 rows with 3 ID 8540
, 2254
, 607
. After running my code the 2nd, 4th, 12th rows are removed.
library(dplyr)
df2 <- structure(list(Unique_Id = c(8540, 8540, 2254, 2254, 607, 607, 607, 607, 607, 607, 607, 607),
AA = c("No", "Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No"),
count = c(1, 2, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8),
BB = c("No", "Yes", "No", "Yes", "No", "No", "No", "No", "No", "No", "No", "Yes")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))
df2
# A tibble: 12 × 4
Unique_Id AA count BB
<dbl> <chr> <dbl> <chr>
1 8540 No 1 No
2 8540 Yes 2 Yes
3 2254 No 1 No
4 2254 No 2 Yes
5 607 No 1 No
6 607 No 2 No
7 607 No 3 No
8 607 No 4 No
9 607 No 5 No
10 607 No 6 No
11 607 No 7 No
12 607 No 8 Yes
df2 %>%
group_by(Unique_Id) %>%
mutate(flag = count %in% range(count)) %>%
filter(flag & if_all(c(AA, BB), ~ toupper(.x) == 'NO') | !flag & toupper(BB) == 'NO') %>%
ungroup() %>%
select(-flag)
# A tibble: 9 × 4
Unique_Id AA count BB
<dbl> <chr> <dbl> <chr>
1 8540 No 1 No
2 2254 No 1 No
3 607 No 1 No
4 607 No 2 No
5 607 No 3 No
6 607 No 4 No
7 607 No 5 No
8 607 No 6 No
9 607 No 7 No