I would like to do this with gsub and/or dplyr in R:
Ok here's the example text:
example_string <-
"Bing Bloop Doop:-14490 Flerp:01 ScoobyDoot:Z1Bling Blong:Zootsuitssasdfasdf"
What I'd like to get:
"Bing Bloop Doop: Flerp: ScoobyDoot:Z1Bling Blong:Zootsuit"
I'd like to strip all the numbers (and any hyphens) except for the Z# and then limit the nchar after the last colon to 9 char. There are always 4 colons.
I'm going through all different kinds of threads and they get close sometimes but no cigar.
I was able to remove all digits:
gsub('[0-9] ', '', example_string)
But this doesn't trim nchar to 9 at the end and also takes out the Z1 part:
"Bing Bloop Doop: Flerp: ScoobyDoot:ZBling Blong:Zootsuitssasdfasdf"
Remove n number of characters in a string after a specific character
Regex allow a string to only contain numbers 0 - 9 and limit length to 45
CodePudding user response:
Here is an option using a combination of base R functions (strsplit, lapply and gsub). Splitting the strings using the colon, and then iterate over each split element to re-split them using a space, detecting a combination of digits (positive/negative) and then re-collapsing using the space and the colon.
# Split the string by the colon
colon_split <- unlist(strsplit(example_string, ":"))
# Over all strings split by the colon
digits_out <- lapply(colon_split, \(x) {
space <- unlist(strsplit(x, "\\s"))
gsub("^-(\\d*)$|^(\\d*)$", "", space) |> paste0(collapse = " ")
})
# Regroup and collapse using the colon
paste0(digits_out, collapse = ":")