Suppose the following tibble
tibble(
examform1 = c("Bla bla bla pass/fail", "Bla bla bla 7 point scale", "Bla bla pass fail"),
examform2 = c("passfail bla", "7pointscale bla", "Bla bla")
)
# A tibble: 3 × 2
examform1 examform2
<chr> <chr>
1 Bla bla bla pass/fail passfail bla
2 Bla bla bla 7 point scale 7pointscale bla
3 Bla bla pass fail Bla bla
I want to count the occurence of the strings in the following two vectors - and specifically, I want to end up with 2 columns, one that counts the number of occurences in any string from the vector pass and another one likewise for the vector scale
pass <- c("pass/fail", "pass fail", "passfail")
scale <- c("7 point scale", "7pointscale")
I have a very large dataframe and wish to to carry out the operation across all variables, as I am not sure which variables are important in terms of where the information I need is stored. It should look like this:
# A tibble: 3 × 4
examform1 examform2 occurence_pass pass_scale
<chr> <chr> <dbl> <dbl>
1 Bla bla bla pass/fail passfail bla 2 0
2 Bla bla bla 7 point scale 7pointscale bla 0 1
3 Bla bla pass fail Bla bla 1 0
I could potentially paste all the variables together and carry on from there - but I think that would be very slow, because my real strings are really long, and I am unsure how to continue after pasting.
Any help is greatly appreciated, I hope I made my question clear :-)!
CodePudding user response:
You can apply grepl
rowwise, i.e,
df$occurence_pass <- colSums(apply(df, 1, function(i)grepl(paste(pass, collapse = '|'), i)))
df$pass_scale <- colSums(apply(df, 1, function(i)grepl(paste(scale, collapse = '|'), i)))
df
# A tibble: 3 x 4
examform1 examform2 occurence_pass pass_scale
<chr> <chr> <dbl> <dbl>
1 Bla bla bla pass/fail passfail bla 2 0
2 Bla bla bla 7 point scale 7pointscale bla 0 2
3 Bla bla pass fail Bla bla 1 0