Home > Back-end >  How to check for a value in a vector in a dataframe column
How to check for a value in a vector in a dataframe column

Time:04-05

I have a data frame with a column containing a list of versions separated with " " symbol (I actually read this from a huge file):

id= c("A", "B") 
versions= c("v2 v3", "v1") 
df=as.data.frame(cbind(id, versions))

I could split the versions values in a new column with strsplit. Now I want to identify for each possible version (v1, v2 or v3) if it exists in the row.

library(dplyr)
df = df %\>% 
   mutate( versions_split=strsplit(versions, split = '\[ \]'), v1=  ifelse(("v1" %in% versions_split),1,0), v2=  ifelse(("v2" %in% versions_split),1,0), v3=  ifelse(("v3" %in% versions_split),1,0) )

But it doesn't work:

df

> id versions versions_split v1 v2 v3
> 1  A    v3 v6         v3, v6  1  0  0
> 2  B       v1             v1  1  0  0

I want it for each row in the data frame, and it seems it is doing it globally. What am I doing wrong?

CodePudding user response:

You could also use str_detect() to find the version and then case_when() to create the new variable.

library(dplyr)
library(stringr)
df <- tibble(id= c("A", "B"),
             versions= c("v2 v3", "v1")
)

df %>% 
  mutate(v1 = case_when(
    str_detect(versions, "v1") ~ 1,
    TRUE ~ 0
  ),
  v2 = case_when(
    str_detect(versions, "v2") ~ 1,
    TRUE ~ 0
  ),
  v3 = case_when(
    str_detect(versions, "v3") ~ 1,
    TRUE ~ 0
  )
  )

CodePudding user response:

stringr::str_detect returns a boolean value which you can switch to a 1 or 0 simply be adding a sign in front:

library(dplyr)
library(stringr)

df %>% 
  mutate(v1 =  str_detect(versions, "v1"),
         v2 =  str_detect(versions, "v2"),
         v3 =  str_detect(versions, "v3"))

Giving:

# A tibble: 2 × 5
  id    versions    v1    v2    v3
  <chr> <chr>    <int> <int> <int>
1 A     v2 v3        0     1     1
2 B     v1           1     0     0
  • Related