Home > Software design >  Regex to remove white space from this exact pattern
Regex to remove white space from this exact pattern

Time:09-20

I am trying to remove the white space that is in this header that appears after the ":" character

batman: 100
robin: OFXSGML
superman: 102
wonderwoman: NONE
joker: USASCII
harley: 1252
aquaman: NONE
flash: NONE
iris: NONE

this is a regex pattern to match this exact header but I keep running into problems trying to delete the white space any help that can be offered is appreciated

^batman:\s100 robin:\sOFXSGML superman:\s102 wonderwoman:\s NONE joker:\sUSASCII harley: 1252 aquaman:\s NONE flash:\sNONE iris:\sNONE$

CodePudding user response:

In your pattern you are using spaces, but if you want to match all lines you can replace them with \s every time you cross a newline.

Then you can after process it replacing :\s with : but note that the pattern is very precise match.


If you want to be more flexible, You can use a capture group to capture all before the : and then match the spaces after it.

^([^\s:] :)[\p{Zs}\t] (?=\S)

The pattern matches:

  • ^ Start of string
  • ([^\s:] :) Capture group 1, match 1 non whitespace chars other than : and then match the :
  • [\p{Zs}\t] Match 1 spaces
  • (?=\S) Postive lookahead, assert a non whitespace char to the right (if there has to be one, else you can omit this part)

In the replacement use group 1 like $1

.NET regex demo

CodePudding user response:

var yourString = @"batman: 100 robin: OFXSGML superman: 102 wonderwoman: NONE joker: USASCII harley: 1252 aquaman: NONE flash: NONE iris: NONE";
yourString = Regex.Replace(yourString, "(?<=:) ", "");

CodePudding user response:

Shouldn't be any more complex than:

string source = @"
batman: 100
robin: OFXSGML
superman: 102
wonderwoman: NONE
joker: USASCII
harley: 1252
aquaman: NONE
flash: NONE
iris: NONE
".Trim();

Regex  rx     = new Regex(@"(?<=:)\s ");
string result = rx.Replace(source, "");
  • (?<=:) is a zero-width positive lookbehind: it anchors the match on a :, without it being a part of the match.

  • \s matches 1 or more whitespace characters (SP, HT, CR, LF, VT).

That changes:

batman: 100
robin: OFXSGML
superman: 102
wonderwoman: NONE
joker: USASCII
harley: 1252
aquaman: NONE
flash: NONE
iris: NONE

into

batman:100
robin:OFXSGML
superman:102
wonderwoman:NONE
joker:USASCII
harley:1252
aquaman:NONE
flash:NONE
iris:NONE

Alternatively, you can include the : in the match. It just changes the replacement text:

Regex  rx     = new Regex(@":\s ");
string result = rx.Replace(source, ":");

If you care about the value of the key preceding the colon-plus-whitespace, use named capture groups and a match evaluator.

Here the regular expression (?<key>\w )\s*:\s* matches:

  • (?<key>\w ) — a sequence of 1 or more whitespace characters (letters, digits or _), followed by
  • \s* — zero or more whitespace characters, followed by
  • : — a literal colon character, followed by
  • \s* — zero or more whitespace characters

The match evaluator looks at the capturing group named key. If it is any of batman, robin, or superman, any whitespace preceding or following the colon is removed; otherwise, the match itself is returned unchanged.

Regex  rx     = new Regex(@"(?<key>\w )\s*:\s*");
string result = rx.Replace(source, (Match m) => {
  string replacement;
  string key = m.Groups["key"].Value;
  
  switch (key) {
  case "batman":
  case "robin":
  case "superman":
    replacement = key ":";
    break;
  default:
    replacement = m.Value;
    break;
  }
  
  return replacement;
});
  • Related