Home > front end >  New column with multiple conditions dplyr
New column with multiple conditions dplyr

Time:03-21

This is my sample data:

id <- c("1a","2c","3d","4f","5g","6e","7f","8q","9r","10v","11x","12l")
dg1 <- c(111234,902754,111235,111236,113456,123754,288563,396186,987654,987865,288563,396186)
dg2 <-c("",111235,111236,113456,"","","","","","",902754,902754)
df<-cbind(id,dg1,dg2)

I'd like to create a new column that indicates whether a string (111 or 113) is present across multiple columns such that my final df looks like this:

dt1 <- c(1,1,1,1,1,0,0,0,0,0,0,0)
df <- cbind(df,dt1)

I've tried to do this:

df %>%
  filter(any_vars(grepl('^(113|111)')))

and

df %>%
  select(contains("113","111"))

with the intention of tagging the filtered rows with mutate then doing a left join into the original data frame.

I've tried to Frankenstein together some code from similar problems to no avail here and here. I'm trying to avoid this solution because my actual data has over a hundred columns to sort through with dozens of strings to search through (trying to avoid typing out and potentially missing a combination).

Can anyone help?

CodePudding user response:

One way to do is.

Data:

id <- c("1a","2c","3d","4f","5g","6e","7f","8q","9r","10v","11x","12l")
dg1 <- c(111234,902754,111235,111236,113456,123754,288563,396186,987654,987865,288563,396186)
dg2 <-c("",111235,111236,113456,"","","","","","",902754,902754)
df<-cbind.data.frame(id,dg1,dg2)

Code:

    df = df %>% mutate(d1= str_match(df$dg1, '^(113|111)') %>% .[,1] %>% {ifelse(.> 0, 1,0 )}) %>% 
  mutate(d2 = str_match(df$dg2, '^(113|111)') %>% .[,1] %>% {ifelse(.> 0, 1,0 )})%>% 
  mutate(dt1 = ifelse(d1 > 0 | d2 > 0, 1,0)) %>% 
  select(id, dg1, dg2, dt1)

    id    dg1    dg2 dt1
1   1a 111234          1
2   2c 902754 111235   1
3   3d 111235 111236   1
4   4f 111236 113456   1
5   5g 113456          1
6   6e 123754         NA
7   7f 288563         NA
8   8q 396186         NA
9   9r 987654         NA
10 10v 987865         NA
11 11x 288563 902754  NA
12 12l 396186 902754  NA

CodePudding user response:

Another option would be:

Check whether either of dg1 or dg2 begin with 111 or 113 using grepl()


df <- df %>% mutate(dt1= ifelse(grepl("^111|^113", dg1) == TRUE |
                                grepl("^111|^113", dg2) == TRUE, 1, 0))

Output

> df
    id    dg1    dg2 dt1
1   1a 111234          1
2   2c 902754 111235   1
3   3d 111235 111236   1
4   4f 111236 113456   1
5   5g 113456          1
6   6e 123754          0
7   7f 288563          0
8   8q 396186          0
9   9r 987654          0
10 10v 987865          0
11 11x 288563 902754   0
12 12l 396186 902754   0

Test if equal to your desired output

dt1 <- c(1,1,1,1,1,0,0,0,0,0,0,0)
df1 <- cbind(df,dt1)

identical(df, df1)

> identical(df, df1)
[1] TRUE
  • Related