I have a long text that contains data like:
23cm,
23m,
60 cm,
60 m,
So sometimes there is a space between number and unit. Sometimes there isn't one. How to add an underscore in each case, so the result would be:
23_cm,
23_m,
60_cm,
60_m
The search pattern for a part of it is probably (\d) (?:cm|m)
, but I can figure out the rest.
CodePudding user response:
I suggest replacing matches of
(?<=\d) ?(?=c?m,)
with an underscore. If a space is present it is matched; else the (zero-width) location between the last digit and 'cm' or 'm' is matched.
The regular expression can be broken down as follows. (I have enclosed the space in a character class to make it visible to the reader.)
(?<= # begin a positive lookbehind
\d # match a digit
) # end positive lookbehind
[ ]? # optionally match a space
(?= # begin a positive lookahead
c?m, # optionally match a 'c' followed by 'm,'
) # end positive lookahead
If the comma is not always present replace (?=c?m,)
with (?=c?m\b)
, \b
being a word boundary.
CodePudding user response:
We can use capturing groups. The following example uses \2 and \3 for the capturing groups. Some languages would use $2 and $3.
See https://regex101.com/r/KxYyrb/1
input string
23cm, 23m, 60 cm, 60 m,
pattern
((\d )\s?(m|cm))
replace using
\2_\3
output
23_cm, 23_m, 60_cm, 60_m,
CodePudding user response:
You can use
(\d)\s?(c?m)\b
The replacement pattern is $1_$2
.
See the regex demo.
Details:
(\d)
- Capturing group 1: a digit\s?
- an optional whitespace char(c?m)
- Capturing group 2: an optionalc
and anm
\b
- a word boundary (else, the regex will matchm
inmen
, for example).