I have some errors in some numbers showing numbers like "59.34343.23". I know the first dot is correct but the second one (or any after the first) should be remove. How can I remove those?
I tried using gsub in R:
gsub("(?<=\\..*)\\.", "", "59.34343.23", perl=T)
or
gsub("(?<!^[^.]*)\\.", "", "59.34343.23", perl=T)
However it gets the following error "invalid regular expression". But I have been trying the same code in a regex tester and it works. What is my mistake here?
CodePudding user response:
We may use
gsub("^[^.] \\.(*SKIP)(*FAIL)|\\.", "", str1, perl = TRUE)
[1] "59.3434323"
data
str1 <- "59.34343.23"
CodePudding user response:
You can use
gsub("^([^.]*\\.)|\\.", "\\1", "59.34343.23")
gsub("^([^.]*\\.)|\\.", "\\1", "59.34343.23", perl=TRUE)
See the R demo online and the regex demo.
Details:
^([^.]*\.)
- Capturing group 1 (referred to as\1
from the replacement pattern): any zero or more chars from the start of string and then a.
char (the first in the string)|
- or\.
- any other dot in the string.
Since the replacement, \1
, refers to Group 1, and Group 1 only contains a value after the text before and including the first dot is matched, the replacement is either this part of text, or empty string (i.e. the second and all subsequent occurrences of dots are removed).
CodePudding user response:
By specifying perl = TRUE
you can convert matches of the following regular expression to empty strings:
^[^.]*\.[^.]*\K.|\.
If you are unfamiliar with \K
hover over it in the regular expression at the link to see an explanation of its effect.
CodePudding user response:
There is always the option to only write back the dot if its the first in the line.
Key feature is to consume the other dots but don't write it back.
Effect is to delete trailing dots.
Below uses a branch reset to accomplish the goal (Perl mode).
(?m)(?|(^[^.\n]*\.)|()\. )
Replace $1
https://regex101.com/r/cHcu4j/1
(?m)
(?|
( ^ [^.\n]* \. ) # (1)
| ( ) # (1)
\.
)
CodePudding user response:
The pattern that you tried does not match, because there is an infinite quantifier in the lookbehind (?<=\\..*)
that is not supported.
Another variation using \G
to get continuous matches after the first dot:
(?:^[^.]*\.|\G(?!^))[^.]*\K\.
In parts, the pattern matches:
(?:
Non capture group for the alternation|
^[^.]*\.
Start of string, match any char except.
, then match.
|
Or\G(?!^)
Assert the position at the end of the previous match (not at the start)
)[^.]*
Optionally match any char except.
\K\.
Clear the match buffer an match the dot (to be removed)
gsub("(?:^[^.]*\\.|\\G(?!^))[^.]*\\K\\.", "", "59.34343.23", perl=T)
Output
[1] "59.3434323"