R: create a new column from an existing one-CodePudding

I have the following column

col1 <- c("J-FG12", "M-L2", "F-001","J-82")

I want to create a second column that considers everything before the "-"

col2 <- c("J","M","F","J")

CodePudding user response：

library(dplyr)
df %>%
  mutate(col2 = sub("([^-] ).*", "\\1", col1))
    col1 col2
1 J-FG12    J
2   M-L2    M
3  F-001    F
4   J-82    J

Here we are using a negative character class [^-] , matching one or more times, which allows any character but the dash/minus -, thus effectively capturing whatever comes before the -, and refer to it in sub's replacement clause via backreference \\1

Data:

df <- data.frame(col1 = c("J-FG12", "M-L2", "F-001", "J-82"))

CodePudding user response：

Use trimws

trimws(col1, whitespace = "-.*")
[1] "J" "M" "F" "J"

CodePudding user response：

Use stringr::word function: https://stringr.tidyverse.org/reference/word.html.

word(col1, sep = "-")

CodePudding user response：

We can use stringr with str_extract and a lookahead((?=)):

library(dplyr)
library(stringr)

df %>% mutate(col2 = str_extract_all(col1, '^. ?(?=-)'))

    col1 col2
1 J-FG12    J
2   M-L2    M
3  F-001    F
4   J-82    J

CodePudding user response：

library(tidyverse)

df <- tibble(col1 = c("J-FG12", "M-L2", "F-001", "J-82"))

df %>%
  mutate(col2 = str_extract(col1, "^.*-") %>%
    str_remove("-"))
#> # A tibble: 4 × 2
#>   col1   col2 
#>   <chr>  <chr>
#> 1 J-FG12 J    
#> 2 M-L2   M    
#> 3 F-001  F    
#> 4 J-82   J


#using tidyr

df %>% 
  separate(col1,
           c('col2','col1'),
           '-')
#> # A tibble: 4 × 2
#>   col2  col1 
#>   <chr> <chr>
#> 1 J     FG12 
#> 2 M     L2   
#> 3 F     001  
#> 4 J     82


#if the characters preceding `-` are all upper case
df %>%
  mutate(col2 = str_extract(col1, "[:UPPER:] "))
#> # A tibble: 4 × 2
#>   col1   col2 
#>   <chr>  <chr>
#> 1 J-FG12 J    
#> 2 M-L2   M    
#> 3 F-001  F    
#> 4 J-82   J

#another approach using str_split and selecting the first element of the resulting lists

df$col1 %>% 
  str_split('-') %>% 
  map_chr(~.[[1]])
#> [1] "J" "M" "F" "J"

^{Created on 2021-11-20 by the reprex package (v2.0.1)}