I have the following column
col1 <- c("J-FG12", "M-L2", "F-001","J-82")
I want to create a second column that considers everything before the "-"
col2 <- c("J","M","F","J")
CodePudding user response:
library(dplyr)
df %>%
mutate(col2 = sub("([^-] ).*", "\\1", col1))
col1 col2
1 J-FG12 J
2 M-L2 M
3 F-001 F
4 J-82 J
Here we are using a negative character class [^-]
, matching one or more times, which allows any character but the dash/minus -
, thus effectively capturing whatever comes before the -
, and refer to it in sub
's replacement clause via backreference \\1
Data:
df <- data.frame(col1 = c("J-FG12", "M-L2", "F-001", "J-82"))
CodePudding user response:
Use trimws
trimws(col1, whitespace = "-.*")
[1] "J" "M" "F" "J"
CodePudding user response:
Use stringr::word
function: https://stringr.tidyverse.org/reference/word.html.
word(col1, sep = "-")
CodePudding user response:
We can use stringr with str_extract
and a lookahead((?=)
):
library(dplyr)
library(stringr)
df %>% mutate(col2 = str_extract_all(col1, '^. ?(?=-)'))
col1 col2
1 J-FG12 J
2 M-L2 M
3 F-001 F
4 J-82 J
CodePudding user response:
library(tidyverse)
df <- tibble(col1 = c("J-FG12", "M-L2", "F-001", "J-82"))
df %>%
mutate(col2 = str_extract(col1, "^.*-") %>%
str_remove("-"))
#> # A tibble: 4 × 2
#> col1 col2
#> <chr> <chr>
#> 1 J-FG12 J
#> 2 M-L2 M
#> 3 F-001 F
#> 4 J-82 J
#using tidyr
df %>%
separate(col1,
c('col2','col1'),
'-')
#> # A tibble: 4 × 2
#> col2 col1
#> <chr> <chr>
#> 1 J FG12
#> 2 M L2
#> 3 F 001
#> 4 J 82
#if the characters preceding `-` are all upper case
df %>%
mutate(col2 = str_extract(col1, "[:UPPER:] "))
#> # A tibble: 4 × 2
#> col1 col2
#> <chr> <chr>
#> 1 J-FG12 J
#> 2 M-L2 M
#> 3 F-001 F
#> 4 J-82 J
#another approach using str_split and selecting the first element of the resulting lists
df$col1 %>%
str_split('-') %>%
map_chr(~.[[1]])
#> [1] "J" "M" "F" "J"
Created on 2021-11-20 by the reprex package (v2.0.1)