Home > OS >  R How to conditionally remove the last N characters from multiple observations
R How to conditionally remove the last N characters from multiple observations

Time:12-24

I am working with a large dataset and am having difficulty removing the last N characters from my variable pitchResult. Below is a sample of what the column looks like:

pitchResult <- c("strike", "Home Run on 368.96 ft fly ball", "Ground out","Home Run 
on a 415.0 ft fly ball", "Home Run on a 401.77 ft line drive", "ball")

However, I only want to remove the last N characters from all observations starting with Home Run so only Home Run is displayed and not the following information. I was thinking gsub could work but unsure how to conditionally format it in this fashion. Thanks all in advance!

CodePudding user response:

ifelse would probably help. Like this:

library(stringr)

> pitchResult
[1] "strike"                             "Home Run on 368.96 ft fly ball"    
[3] "Ground out"                         "Home Run \non a 415.0 ft fly ball" 
[5] "Home Run on a 401.77 ft line drive" "ball"   
                           
> ifelse(grepl("Home Run",pitchResult),str_extract(pitchResult,"Home Run"),pitchResult)
[1] "strike"     "Home Run"   "Ground out" "Home Run"   "Home Run"  
[6] "ball"      

CodePudding user response:

You don't need an ifelse statement. Just use regex backreferencing with capturing parenthesis.

> gsub(x = pitchResult, pattern = "(Home Run). $", replacement = "\\1")
[1] "strike"     "Home Run"   "Ground out" "Home Run"   "Home Run"   "ball"   
  •  Tags:  
  • r
  • Related