This is my sample data:
id <- c("1a","2c","3d","4f","5g","6e","7f","8q","9r","10v","11x","12l")
dg1 <- c(111234,902754,111235,111236,113456,123754,288563,396186,987654,987865,288563,396186)
dg2 <-c("",111235,111236,113456,"","","","","","",902754,902754)
df<-cbind(id,dg1,dg2)
I'd like to create a new column that indicates whether a string (111 or 113) is present across multiple columns such that my final df looks like this:
dt1 <- c(1,1,1,1,1,0,0,0,0,0,0,0)
df <- cbind(df,dt1)
I've tried to do this:
df %>%
filter(any_vars(grepl('^(113|111)')))
and
df %>%
select(contains("113","111"))
with the intention of tagging the filtered rows with mutate then doing a left join into the original data frame.
I've tried to Frankenstein together some code from similar problems to no avail here and here. I'm trying to avoid this solution because my actual data has over a hundred columns to sort through with dozens of strings to search through (trying to avoid typing out and potentially missing a combination).
Can anyone help?
CodePudding user response:
One way to do is.
Data:
id <- c("1a","2c","3d","4f","5g","6e","7f","8q","9r","10v","11x","12l")
dg1 <- c(111234,902754,111235,111236,113456,123754,288563,396186,987654,987865,288563,396186)
dg2 <-c("",111235,111236,113456,"","","","","","",902754,902754)
df<-cbind.data.frame(id,dg1,dg2)
Code:
df = df %>% mutate(d1= str_match(df$dg1, '^(113|111)') %>% .[,1] %>% {ifelse(.> 0, 1,0 )}) %>%
mutate(d2 = str_match(df$dg2, '^(113|111)') %>% .[,1] %>% {ifelse(.> 0, 1,0 )})%>%
mutate(dt1 = ifelse(d1 > 0 | d2 > 0, 1,0)) %>%
select(id, dg1, dg2, dt1)
id dg1 dg2 dt1
1 1a 111234 1
2 2c 902754 111235 1
3 3d 111235 111236 1
4 4f 111236 113456 1
5 5g 113456 1
6 6e 123754 NA
7 7f 288563 NA
8 8q 396186 NA
9 9r 987654 NA
10 10v 987865 NA
11 11x 288563 902754 NA
12 12l 396186 902754 NA
CodePudding user response:
Another option would be:
Check whether either of dg1
or dg2
begin with 111 or 113 using grepl()
df <- df %>% mutate(dt1= ifelse(grepl("^111|^113", dg1) == TRUE |
grepl("^111|^113", dg2) == TRUE, 1, 0))
Output
> df
id dg1 dg2 dt1
1 1a 111234 1
2 2c 902754 111235 1
3 3d 111235 111236 1
4 4f 111236 113456 1
5 5g 113456 1
6 6e 123754 0
7 7f 288563 0
8 8q 396186 0
9 9r 987654 0
10 10v 987865 0
11 11x 288563 902754 0
12 12l 396186 902754 0
Test if equal to your desired output
dt1 <- c(1,1,1,1,1,0,0,0,0,0,0,0)
df1 <- cbind(df,dt1)
identical(df, df1)
> identical(df, df1)
[1] TRUE