Home > Mobile >  Separate column by string patterns
Separate column by string patterns

Time:08-19

I have the following example:

structure(list(value = c("./LRoot_1/LClass_copepodo", "./LRoot_1/LClass_shadow", 
"./LRoot_2/LClass_bolha", "./LRoot_2/LClass_cladocera")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

I would to like separate this in two columns, the first column with the names that are between bars "/" and the second column with the name after "LClass_"

Thanks all

CodePudding user response:

We could use extract - capture ((...)) the substrings that we needed while remove the rest of characters i.e, below regex matches the . (\\.- metacharacter that matches any character thus escaped), followed by the /, then capture the one or more ( ) characters that are not a / ([^/]) as the first capture group, followed by matching the 'LCass_' substring and then capture the rest of characters (.*) as second capture group

library(tidyr)
extract(df1, value, into = c("first", "second"),
     "\\./([^/] )/LClass_(.*)", remove = FALSE)

-output

# A tibble: 4 × 3
  value                      first   second   
  <chr>                      <chr>   <chr>    
1 ./LRoot_1/LClass_copepodo  LRoot_1 copepodo 
2 ./LRoot_1/LClass_shadow    LRoot_1 shadow   
3 ./LRoot_2/LClass_bolha     LRoot_2 bolha    
4 ./LRoot_2/LClass_cladocera LRoot_2 cladocera

CodePudding user response:

Here is an alternative option using separate and word:

library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  separate(col = value, into = c("first", "second"), sep = "/(?=LClass)") %>% 
  mutate(first = word(first, 2, sep = "/"),
         second = word(second, 2, sep = "_"))

# A tibble: 4 x 2
  first   second   
  <chr>   <chr>    
1 LRoot_1 copepodo 
2 LRoot_1 shadow   
3 LRoot_2 bolha    
4 LRoot_2 cladocera

CodePudding user response:

You can separate() the column at / then remove the pattern LClass_ via mutate():

library(dplyr)
library(tidyr)

DF %>% 
  separate(col = value, into = c(NA, 'a', 'b'), sep = '/', remove = FALSE) %>% 
  mutate(
    b = gsub(pattern = 'LClass_', replacement = '', x = b)
  )
#> # A tibble: 4 × 3
#>   value                      a       b        
#>   <chr>                      <chr>   <chr>    
#> 1 ./LRoot_1/LClass_copepodo  LRoot_1 copepodo 
#> 2 ./LRoot_1/LClass_shadow    LRoot_1 shadow   
#> 3 ./LRoot_2/LClass_bolha     LRoot_2 bolha    
#> 4 ./LRoot_2/LClass_cladocera LRoot_2 cladocera
  • Related