In R, I have code that contains this column of text:
In the Player column, I only need the text before the backslash.
Desired output
Joey Votto
Juan Soto
Charlie Blackmon
Freddie Freeman
Here is the dput result
structure(list(Player = c("Joey Votto\\vottojo01", "Juan Soto\\sotoju01",
"Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01"),
TOB = c(321, 304, 288, 274), TB = c(323, 268, 387, 312),
G = c(162, 151, 159, 162), WAR = c(8.1, 7.1, 5.5, 5.5)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
I would prefer a code that works in R with dplyr.
I have tried to get it to work with this code, but it did not work:
mutate(Name = str_extract(Player, "(?=\\)"))
I have looked at the solutions Stackover suggested, but did not see one that fit my situation. If there is one that I missed, please let me know.
CodePudding user response:
From R 4.0.0, you can use raw strings, so no need for double backlashes, just use the following syntax: r"(your_raw_expression)"
(parentheses included). Here we can do:
str_remove(df$Player, r"(\\.*)")
# [1] "Joey Votto" "Juan Soto" "Charlie Blackmon" "Freddie Freeman"
str_extract(df$Player, r"(^.*(?=\\))")
# [1] "Joey Votto" "Juan Soto" "Charlie Blackmon" "Freddie Freeman"
CodePudding user response:
Another possible solution, based on stringr::str_extract
:
library(tidyverse)
df <- structure(list(Player = c("Joey Votto\\vottojo01", "Juan Soto\\sotoju01",
"Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01"),
TOB = c(321, 304, 288, 274), TB = c(323, 268, 387, 312),
G = c(162, 151, 159, 162), WAR = c(8.1, 7.1, 5.5, 5.5)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
df %>%
mutate(Player = str_extract(Player, "^.*(?=\\\\)"))
#> # A tibble: 4 × 5
#> Player TOB TB G WAR
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Joey Votto 321 323 162 8.1
#> 2 Juan Soto 304 268 151 7.1
#> 3 Charlie Blackmon 288 387 159 5.5
#> 4 Freddie Freeman 274 312 162 5.5
CodePudding user response:
We may use str_remove
to match the \\
and remove the rest
library(stringr)
str_remove(str1, "\\\\.*")
[1] "Charlie Blackmon" "Freddie Freeman"
If we use in tidyverse
syntax
library(dplyr)
df1 <- df1 %>%
mutate(Player = str_remove(Player, "\\\\.*"))
-output
df1
# A tibble: 4 × 5
Player TOB TB G WAR
<chr> <dbl> <dbl> <dbl> <dbl>
1 Joey Votto 321 323 162 8.1
2 Juan Soto 304 268 151 7.1
3 Charlie Blackmon 288 387 159 5.5
4 Freddie Freeman 274 312 162 5.5
Or using base R
with trimws
trimws(df1$Player, whitespace = "\\\\.*")
[1] "Joey Votto" "Juan Soto" "Charlie Blackmon" "Freddie Freeman"
data
str1 <- c("Charlie Blackmon\\blackch02", "Freddie Freeman\\freemfr01")