I made an minimum reproducible example
modelcoef <- c( 'model1_1_ef','model1_1_ev1','model1_1_ev2','model2_1_ef','model2_1_ev1','model2_1_ev2')
id <- 1:6
value <- c(3,1,4,6,4,6)
data<-data.frame(modelcoef,id,value)
subset1<- data %>%
subset(modelcoef %in% c('ev1','ev2'))
# observation 0, so failed.
I try to subset my data based on the categorical variable "modelcoef". However, that above code does not seem to be work. I want to subset the data-- I want to take a dataset that modelcoef columns contains "ev1" or "ev2".
I can do it manually with this example, but my real data is really huge, I cannot do that manually
CodePudding user response:
library(dplyr)
library(stringr)
library(tidyr)
modelcoef <- c( 'model1_1_ef','model1_1_ev1','model1_1_ev2','model2_1_ef','model2_1_ev1','model2_1_ev2')
id <- 1:6
value <- c(3,1,4,6,4,6)
data <- data.frame(modelcoef,id,value)
# Regular expression solution
# You can add levels with the | separator (or)
# it can be more dangerous if there is a possibility to have ev1 in other occurences of ev1 ev2 in the initial modelcoef variable
subset1 <- data %>%
filter(stringr::str_detect(modelcoef, "ev1|ev2"))
# tidyr better solution
data %>%
tidyr::separate(modelcoef, into = c("model_id", "unknown_thing", "modelcoef"), sep = "_") %>%
filter(modelcoef %in% c("ev1", "ev2"))
#> model_id unknown_thing modelcoef id value
#> 1 model1 1 ev1 2 1
#> 2 model1 1 ev2 3 4
#> 3 model2 1 ev1 5 4
#> 4 model2 1 ev2 6 6
Created on 2022-06-17 by the reprex package (v2.0.1)
The tidyr::separate()
function takes a column as input and separates it as multiple column. Your initial modelcoef column seemed to follow the pattern
(model_id)_(number)_(modelcoef)
, so if it is true for all your data, the solution should work. It just separates your data into 3 separate columns.
Then you can use your newly created variable "modelcoef" to filter for ev1, ev2
CodePudding user response:
You need to use a regex pattern, ev(1|2)
in this case:
library(dplyr)
library(stringr)
data %>%
filter(str_detect(modelcoef, "ev(1|2)"))
modelcoef id value
1 model1_1_ev1 2 1
2 model1_1_ev2 3 4
3 model2_1_ev1 5 4
4 model2_1_ev2 6 6