Home > other >  Extract text before backslash in R
Extract text before backslash in R

Time:01-26

In R, I have code that contains this column of text:

enter image description here

In the Player column, I only need the text before the backslash.

Desired output

Joey Votto

Juan Soto

Charlie Blackmon

Freddie Freeman

Here is the dput result

structure(list(Player = c("Joey Votto\\vottojo01", "Juan Soto\\sotoju01", 
"Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01"), 
    TOB = c(321, 304, 288, 274), TB = c(323, 268, 387, 312), 
    G = c(162, 151, 159, 162), WAR = c(8.1, 7.1, 5.5, 5.5)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

I would prefer a code that works in R with dplyr.

I have tried to get it to work with this code, but it did not work:

mutate(Name = str_extract(Player, "(?=\\)"))

I have looked at the solutions Stackover suggested, but did not see one that fit my situation. If there is one that I missed, please let me know.

CodePudding user response:

From R 4.0.0, you can use raw strings, so no need for double backlashes, just use the following syntax: r"(your_raw_expression)" (parentheses included). Here we can do:

str_remove(df$Player, r"(\\.*)")
# [1] "Joey Votto"       "Juan Soto"        "Charlie Blackmon" "Freddie Freeman"

str_extract(df$Player, r"(^.*(?=\\))")
# [1] "Joey Votto"       "Juan Soto"        "Charlie Blackmon" "Freddie Freeman"

CodePudding user response:

Another possible solution, based on stringr::str_extract:

library(tidyverse)

df <- structure(list(Player = c("Joey Votto\\vottojo01", "Juan Soto\\sotoju01", 
"Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01"), 
    TOB = c(321, 304, 288, 274), TB = c(323, 268, 387, 312), 
    G = c(162, 151, 159, 162), WAR = c(8.1, 7.1, 5.5, 5.5)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

df %>% 
  mutate(Player = str_extract(Player, "^.*(?=\\\\)"))

#> # A tibble: 4 × 5
#>   Player             TOB    TB     G   WAR
#>   <chr>            <dbl> <dbl> <dbl> <dbl>
#> 1 Joey Votto         321   323   162   8.1
#> 2 Juan Soto          304   268   151   7.1
#> 3 Charlie Blackmon   288   387   159   5.5
#> 4 Freddie Freeman    274   312   162   5.5

CodePudding user response:

We may use str_remove to match the \\ and remove the rest

library(stringr)
 str_remove(str1, "\\\\.*")
[1] "Charlie Blackmon" "Freddie Freeman" 

If we use in tidyverse syntax

library(dplyr)
df1 <- df1 %>%
    mutate(Player = str_remove(Player, "\\\\.*"))

-output

df1
# A tibble: 4 × 5
  Player             TOB    TB     G   WAR
  <chr>            <dbl> <dbl> <dbl> <dbl>
1 Joey Votto         321   323   162   8.1
2 Juan Soto          304   268   151   7.1
3 Charlie Blackmon   288   387   159   5.5
4 Freddie Freeman    274   312   162   5.5

Or using base R with trimws

 trimws(df1$Player, whitespace = "\\\\.*")
[1] "Joey Votto"       "Juan Soto"        "Charlie Blackmon" "Freddie Freeman" 

data

str1 <- c("Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01")
  •  Tags:  
  • Related