I have the following dataset called df
:
structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z",
"e f", "b c", "f g", "c d", "y z", "t u")), class = "data.frame", row.names = c(NA,
-12L))
For this dataset, I have two vector with matches: A vector called matching1 <- c("a b", "b c", "c d")
and a vector called matching2 <- c("c d","e f","f g")
. In my df
, I would like to create a new column and assign a value for a match. For the vector matching1
, I would like to assign a value of 1, for the vector matching2
I would like to assign a value of 2 and for every string not matched a value of 3. Ideally, the value assignment for vector matching2
would not change the previous value assigment because the vector matching1
and matching2
both feature the string "d e"
. I know I can use:
matches1 <- paste0(na.omit(matching1), "", collapse = "|")
to create a collapsed vector with or
and I have tried to combine it with case_when
. However case_when
does only allow single patterns and the list of potential matches in my original dataset is very long, so I would like to avoid spelling out every condition explicitely.
The output should look like this:
structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z",
"e f", "b c", "f g", "c d", "y z", "t u"), col2 = c("1", "2",
"3", "3", "3", "3", "2", "1", "2", "1", "3", "3")), class = "data.frame", row.names = c(NA,
-12L))
CodePudding user response:
I think this does it:
edit: performing match2, to catch the situation where "c d" is in both, and match1 is preferred
df$ans<-ifelse(df$col1 %in% matching2, 2, 3)
df$ans<-ifelse(df$col1 %in% matching1, 1, df$ans)
Or pre-edit version with langtang's comment:
df$ans<-ifelse(df$col1 %in% matching1, 1, 3)
df$ans<-ifelse(df$col1 %in% setdiff(matching2, matching1), 2, df$ans)
CodePudding user response:
Here is an option using data.table, with a merge
library(data.table)
rbind(
data.table(col1=matching1, col2=1),
data.table(col1=setdiff(matching2,matching1), col2=2)
)[setDT(df), on="col1"][is.na(col2), col2:=3][]
Output:
col1 col2
<char> <num>
1: a b 1
2: d e 3
3: g f 3
4: h j 3
5: j k 3
6: y z 3
7: e f 2
8: b c 1
9: f g 2
10: c d 1
11: y z 3
12: t u 3
CodePudding user response:
Two ways to solve your problem:
library(dplyr)
df %>%
mutate(col2 = case_when(col1 %in% matching1 ~ 1,
col1 %in% matching2 ~ 2,
TRUE ~ 3))
col1 col2
1 a b 1
2 d e 3
3 g f 3
4 h j 3
5 j k 3
6 y z 3
7 e f 2
8 b c 1
9 f g 2
10 c d 1
11 y z 3
12 t u 3
Or
library(data.table)
setDT(df)[, col2 := fcase(col1 %in% matching1, 1, col1 %in% matching2, 2, default=3)]
col1 col2
<char> <num>
1: a b 1
2: d e 3
3: g f 3
4: h j 3
5: j k 3
6: y z 3
7: e f 2
8: b c 1
9: f g 2
10: c d 1
11: y z 3
12: t u 3