I have this type of data:
set.seed(123)
df <- data.frame(
v1 = sample(LETTERS[1:10], 5),
v2 = sample(LETTERS[1:10], 5),
v3 = sample(LETTERS[1:10], 5),
v4 = sample(LETTERS[1:10], 5)
)
as well as a number of key
words:
keys <- c("A", "C", "F", "H")
I want to (i) detect which key
word is present in each column of df
and (ii) create new columns for each of the key
s, recording with 1
if that key
is present. I've been using a for
loop so far, which, however, does not work:
library(stringr)
for(i in keys){
df[i] <- str_detect(apply(df, 1, paste0, collapse = " "), keys[i])
}
df
v1 v2 v3 v4 A C F H
1 C A J I NA NA NA NA
2 H E E C NA NA NA NA
3 D H F A NA NA NA NA
4 G D I J NA NA NA NA
5 F C A F NA NA NA NA
I don't know why it doesn't work, as the same logic applied to a single key
does work:
str_detect(apply(df, 1, paste0, collapse = " "), keys[1])
[1] 1 0 1 0 1
I'd be grateful for advice on how the for
loop must be tweaked but also for an entirely different solution to the problem (e.g., dplyr
).
CodePudding user response:
A base
solution:
cbind(df, sapply(keys, grepl, x = do.call(paste, df)))
# v1 v2 v3 v4 A C F H
# 1 C E J I 0 1 0 0
# 2 J D E J 0 0 0 0
# 3 B F C E 0 1 1 0
# 4 H H H C 0 1 0 1
# 5 F A A B 1 0 1 0
CodePudding user response:
In your for loop you are indexing on both the index and the name. Test keys["A"]
, it'll return NA
. You can use which
to avoid this:
library(stringr)
for(i in keys){
df[i] <- ifelse(str_detect(apply(df, 1, paste0, collapse = " "), keys[which(keys == i)]), 1, 0)
}
output
v1 v2 v3 v4 A C F H
1 C E J I 0 1 0 0
2 J D E J 0 0 0 0
3 B F C E 0 1 1 0
4 H H H C 0 1 0 1
5 F A A B 1 0 1 0
Another option would be to index on the position (1, 2, 3, 4):
for(i in seq_along(keys)){
df[keys[i]] <- ifelse(str_detect(apply(df, 1, paste0, collapse = " "), keys[i]), 1, 0)
}
CodePudding user response:
Using dplyr
's if_any
:
library(stringr)
library(dplyr)
library(purrr)
bind_cols(df,
map_dfc(keys, \(x) df %>%
transmute(!!(x) := (if_any(v1:v4, \(y) str_detect(x, y))))))
output
v1 v2 v3 v4 A C F H
1 C E J I 0 1 0 0
2 J D E J 0 0 0 0
3 B F C E 0 1 1 0
4 H H H C 0 1 0 1
5 F A A B 1 0 1 0