Home > Software design >  Apply multiple conditions to single column
Apply multiple conditions to single column

Time:12-30

Code count AA BB CC
101 1 No NO 4
101 2 Yes NO 5
101 3 Yes NO 10
102 1 Yes NO 7
102 2 Yes NO 40
102 3 Yes NO 6
102 4 No NO 12

I want to apply the condition as,

  • If the count column is 1 with respect to code column then AA should be "NO" and BB should be "NO".
  • For count between the max and min count with respect to code column then AA can be "NO" or "YES" and BB should be "NO".
  • For the max count column with respect to code column then AA should be "NO" and BB should be "NO".
Code count AA BB CC
101 1 No NO 4
101 2 Yes NO 5
102 2 Yes NO 40
102 3 Yes NO 6
102 4 No NO 12

Hi,@Darren Tsai Whatever might be the case if the count column is 1 then it is getting deleted completely, by using you code I am getting the below output

Code count AA BB CC
101 2 Yes NO 5
102 2 Yes NO 40
102 3 Yes NO 6
102 4 No NO 12

CodePudding user response:

A dplyr solution:

library(dplyr)

df %>%
  group_by(Code) %>%
  mutate(flag = count %in% range(count)) %>%
  filter(flag & if_all(c(AA, BB), ~ toupper(.x) == 'NO') | !flag & toupper(BB) == 'NO') %>%
  ungroup() %>%
  select(-flag)

# # A tibble: 5 × 5
#    Code count AA    BB       CC
#   <int> <int> <chr> <chr> <int>
# 1   101     1 No    NO        4
# 2   101     2 Yes   NO        5
# 3   102     2 Yes   NO       40
# 4   102     3 Yes   NO        6
# 5   102     4 No    NO       12

A base equivalent:

df |>
  transform(flag = ave(count, Code, FUN = \(x) x %in% range(x))) |>
  subset(flag & toupper(AA) == 'NO' & toupper(BB) == 'NO' | !flag & toupper(BB) == 'NO', -flag)

Data
df <- structure(list(Code = c(101L, 101L, 101L, 102L, 102L, 102L, 102L),
count = c(1L, 2L, 3L, 1L, 2L, 3L, 4L), AA = c("No", "Yes",
"Yes", "Yes", "Yes", "Yes", "No"), BB = c("NO", "NO", "NO", "NO",
"NO", "NO", "NO"), CC = c(4L, 5L, 10L, 7L, 40L, 6L, 12L)), class = "data.frame", row.names = c(NA,-7L))

Update with another dataset

This dataset has 12 rows with 3 ID 8540, 2254, 607. After running my code the 2nd, 4th, 12th rows are removed.

library(dplyr)

df2 <- structure(list(Unique_Id = c(8540, 8540, 2254, 2254, 607, 607, 607, 607, 607, 607, 607, 607),
                      AA = c("No", "Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No"),
                      count = c(1, 2, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8),
                      BB = c("No", "Yes", "No", "Yes", "No", "No", "No", "No", "No", "No", "No", "Yes")),
                      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))

df2
# A tibble: 12 × 4
   Unique_Id AA    count BB
       <dbl> <chr> <dbl> <chr>
 1      8540 No        1 No
 2      8540 Yes       2 Yes
 3      2254 No        1 No
 4      2254 No        2 Yes
 5       607 No        1 No
 6       607 No        2 No   
 7       607 No        3 No
 8       607 No        4 No
 9       607 No        5 No
10       607 No        6 No
11       607 No        7 No
12       607 No        8 Yes

df2 %>%
  group_by(Unique_Id) %>%
  mutate(flag = count %in% range(count)) %>%
  filter(flag & if_all(c(AA, BB), ~ toupper(.x) == 'NO') | !flag & toupper(BB) == 'NO') %>%
  ungroup() %>%
  select(-flag)

# A tibble: 9 × 4
  Unique_Id AA    count BB
      <dbl> <chr> <dbl> <chr>
1      8540 No        1 No
2      2254 No        1 No
3       607 No        1 No
4       607 No        2 No
5       607 No        3 No
6       607 No        4 No
7       607 No        5 No
8       607 No        6 No   
9       607 No        7 No
  • Related