I am working with a large dataset and am having difficulty removing the last N characters from my variable pitchResult. Below is a sample of what the column looks like:
pitchResult <- c("strike", "Home Run on 368.96 ft fly ball", "Ground out","Home Run
on a 415.0 ft fly ball", "Home Run on a 401.77 ft line drive", "ball")
However, I only want to remove the last N characters from all observations starting with Home Run so only Home Run is displayed and not the following information. I was thinking gsub could work but unsure how to conditionally format it in this fashion. Thanks all in advance!
CodePudding user response:
ifelse
would probably help. Like this:
library(stringr)
> pitchResult
[1] "strike" "Home Run on 368.96 ft fly ball"
[3] "Ground out" "Home Run \non a 415.0 ft fly ball"
[5] "Home Run on a 401.77 ft line drive" "ball"
> ifelse(grepl("Home Run",pitchResult),str_extract(pitchResult,"Home Run"),pitchResult)
[1] "strike" "Home Run" "Ground out" "Home Run" "Home Run"
[6] "ball"
CodePudding user response:
You don't need an ifelse statement. Just use regex backreferencing with capturing parenthesis.
> gsub(x = pitchResult, pattern = "(Home Run). $", replacement = "\\1")
[1] "strike" "Home Run" "Ground out" "Home Run" "Home Run" "ball"