I have a data frame with a column containing a list of versions separated with " " symbol (I actually read this from a huge file):
id= c("A", "B")
versions= c("v2 v3", "v1")
df=as.data.frame(cbind(id, versions))
I could split the versions values in a new column with strsplit
. Now I want to identify for each possible version (v1, v2 or v3) if it exists in the row.
library(dplyr)
df = df %\>%
mutate( versions_split=strsplit(versions, split = '\[ \]'), v1= ifelse(("v1" %in% versions_split),1,0), v2= ifelse(("v2" %in% versions_split),1,0), v3= ifelse(("v3" %in% versions_split),1,0) )
But it doesn't work:
df
> id versions versions_split v1 v2 v3
> 1 A v3 v6 v3, v6 1 0 0
> 2 B v1 v1 1 0 0
I want it for each row in the data frame, and it seems it is doing it globally. What am I doing wrong?
CodePudding user response:
You could also use str_detect()
to find the version and then case_when()
to create the new variable.
library(dplyr)
library(stringr)
df <- tibble(id= c("A", "B"),
versions= c("v2 v3", "v1")
)
df %>%
mutate(v1 = case_when(
str_detect(versions, "v1") ~ 1,
TRUE ~ 0
),
v2 = case_when(
str_detect(versions, "v2") ~ 1,
TRUE ~ 0
),
v3 = case_when(
str_detect(versions, "v3") ~ 1,
TRUE ~ 0
)
)
CodePudding user response:
stringr::str_detect
returns a boolean value which you can switch to a 1 or 0 simply be adding a
sign in front:
library(dplyr)
library(stringr)
df %>%
mutate(v1 = str_detect(versions, "v1"),
v2 = str_detect(versions, "v2"),
v3 = str_detect(versions, "v3"))
Giving:
# A tibble: 2 × 5
id versions v1 v2 v3
<chr> <chr> <int> <int> <int>
1 A v2 v3 0 1 1
2 B v1 1 0 0