Home > Back-end >  Regex to add underscore between number and unit (or replace whitespace with underscore between numbe
Regex to add underscore between number and unit (or replace whitespace with underscore between numbe

Time:04-26

I have a long text that contains data like:

23cm,
23m,
60 cm,
60 m,

So sometimes there is a space between number and unit. Sometimes there isn't one. How to add an underscore in each case, so the result would be:

23_cm,
23_m,
60_cm,
60_m

The search pattern for a part of it is probably (\d) (?:cm|m), but I can figure out the rest.

CodePudding user response:

I suggest replacing matches of

(?<=\d) ?(?=c?m,)

with an underscore. If a space is present it is matched; else the (zero-width) location between the last digit and 'cm' or 'm' is matched.

Demo

The regular expression can be broken down as follows. (I have enclosed the space in a character class to make it visible to the reader.)

(?<=     # begin a positive lookbehind    
  \d     # match a digit
)        # end positive lookbehind
[ ]?     # optionally match a space
(?=      # begin a positive lookahead  
  c?m,   # optionally match a 'c' followed by 'm,'
)        # end positive lookahead

If the comma is not always present replace (?=c?m,) with (?=c?m\b), \b being a word boundary.

CodePudding user response:

We can use capturing groups. The following example uses \2 and \3 for the capturing groups. Some languages would use $2 and $3.
See https://regex101.com/r/KxYyrb/1
input string

23cm, 23m, 60 cm, 60 m,

pattern

((\d )\s?(m|cm))

replace using

\2_\3

output

23_cm, 23_m, 60_cm, 60_m,

CodePudding user response:

You can use

(\d)\s?(c?m)\b

The replacement pattern is $1_$2.

See the regex demo.

Details:

  • (\d) - Capturing group 1: a digit
  • \s? - an optional whitespace char
  • (c?m) - Capturing group 2: an optional c and an m
  • \b - a word boundary (else, the regex will match m in men, for example).
  • Related