Home > Software design >  How to remove parenthesis and everything in between in a string in R where the parenthesis could be
How to remove parenthesis and everything in between in a string in R where the parenthesis could be

Time:10-26

I have a data set that gives me tennis scores but as character type such as "6-4 3-6 6-2" and "7-6(6) 6-2". I want to add up all of the games played in the match so I need to remove the hyphens, spaces, and tiebreaker score which in the second example is seen as (6). Then, convert them to doubles and add every individual number to get total games played in the match so for the first and second examples the total games played would be 27 and 21 respectively.

So far I can deal with removing the dashes and spaces by using the stringr package and using str_replace_all(score, c("-" = "", " " = "")) which gives me a string with numbers. I can't figure out how to remove the tiebreak scores from a string because the value between the parenthesis could be anything. Somehow need to figure out how to replace "(...)" to "", where anything string could be inside the parenthesis (in my case it is only one number). Also, the parenthesis could appear anywhere in the string.

CodePudding user response:

games <- c("6-4 3-6 6-2" , "7-6(6) 6-2")

sub("\\(.*\\)", "", games) |>
  strsplit(split="-|\\s*") |>
  sapply(function(x) sum(as.numeric(x), na.rm = TRUE))
[1] 27 21
  • Related