I am trying to make new columns in a data frame that contain elements of one original column and have tried through a variety of ways using sapplyc and unnesting but have not gotten the right results. The column of interest looks like: df<-data_frame(file=c('U2_pN_Len', 'MM_pND_con', 'COS_CTL'))
and I want to separate the parts around in-between the _ into their own columns. I tried by making a reference data frame: df2<-data_frame(cell=c('U2', 'MM', 'COS'), insert=c('pND', 'pN', 'pGFP'), trt=c('Len', 'con', 'CTL'))
and was thinking of using an if command where if an aspect of the original data frame matched some an aspect of each column of df2 then it would print that match, ie I want my final data frame to look like:
out<-data_frame(file=c('U2_pN_Len', 'MM_pND_con', 'COS_CTL'), cell=c('U2', 'MM', 'COS'), insert=c('pND', 'pN', 'pGFP'), trt=c('Len', 'con', 'CTL'))
Anyone have any advice on how to do this using any command?
CodePudding user response:
In this approach, each column from df2 becomes a set of terms to check against within str_extract
. Depending on how large the actual lists are in your data set, this might become unwieldy.
library(tidyverse)
df<-tibble(file=c('U2_pN_Len', 'MM_pND_con', 'COS_CTL'))
df2<-tibble(cell=c('U2', 'MM', 'COS'), insert=c('pND', 'pN', 'pGFP'), trt=c('Len', 'con', 'CTL'))
cell = str_c(df2$cell, collapse = "|")
insert = str_c(df2$insert, collapse = "|")
trt = str_c(df2$trt, collapse = "|")
df %>%
mutate(cell = str_extract(file, cell),
insert = str_extract(file, insert),
trt = str_extract(file, trt))
#> # A tibble: 3 × 4
#> file cell insert trt
#> <chr> <chr> <chr> <chr>
#> 1 U2_pN_Len U2 pN Len
#> 2 MM_pND_con MM pND con
#> 3 COS_CTL COS <NA> CTL
Created on 2022-12-09 with reprex v2.0.2