I have a specific issue with character substitution in strings:
If I have the following strings
"..A.B....c...A..D.."
"A..S.E.Q.......AW.."
".B.C..a...R......Ds"
Which regex substitution should I use to replace the dots and obtain the following strings:
"A_B_c_A_D"
"A_S_E_Q_AW"
"B_C_a_R_Ds"
I am using R.
Thanks in advance!
CodePudding user response:
Using stringr
from the ever fantastic tidyverse
.
str1 <- "..A.B....c...A..D.."
str1 %>%
#replace all dots that follow any word character ('\\.' escapes search, ' ' matches one or more, '(?<=\\w)' followed by logic)
str_replace_all('(?<=\\w)\\. (?=\\w)', '_') %>%
#delete remaining dots (i.e. at the start)
str_remove_all('\\.')
As always plenty of ways to skin the cat with regex
CodePudding user response:
Here a solution using gsub in two parts
string = c("..A.B....c...A..D..","A..S.E.Q.......AW..",".B.C..a...R......Ds")
first remove start and end points
string2 = gsub("^\\. |\\. $", "", string)
finally replace one or more points with _
string2 = gsub("\\. ", "_", string2)
CodePudding user response:
Using x shown in the Note at the end, use trimws to trim dot off both ends. dot means any character so we have to escape it with backslashes to remove that meaning. Then replace every dot with underscore using chartr. No packages are used.
x |> trimws("both", "\\.") |> chartr(old = ".", new = "_")
## [1] "A_B____c___A__D" "A__S_E_Q_______AW" "B_C__a___R______Ds"
Note
x <- c("..A.B....c...A..D..",
"A..S.E.Q.......AW..",
".B.C..a...R......Ds")