Home > Blockchain >  Regex to match two or more spaces
Regex to match two or more spaces

Time:05-19

I'm trying to parse some attributes from a modem's AT output. My regex is as follow:

([^:]*):\s*([^\s]*)

Sample output as follow:

LTE SSC1 bw  : 20 MHz           LTE SSC1 chan: 2850
LTE SSC2 state:INACTIVE         LTE SSC2 band: B20
LTE SSC2 bw  : 10 MHz           LTE SSC2 chan: 6300
EMM state:     Registered       Normal Service
RRC state:     RRC Connected
IMS reg state: NOT REGISTERED   IMS mode:    Normal

This mostly works ok but not so well where an attribute's value has more characters after the first whitespace. For example, the match "LTE SSC2 bw" has a group 2 value of "10" when it should be "10 MHz".

Ideally I need the regex to match exactly the attributes, and group the value for it.

Hope this makes sense and thanks for your help.

CodePudding user response:

If there is always at least two spaces between the key-value pairs you can use

([^:\s][^:]*):[^\S\r\n]*(\S (?:[^\S\r\n]\S )*)

See the regex demo.

Details:

  • ([^:\s][^:]*) - Group 1: a char other than whitespace and : and then zero or more non-: chars
  • : - a colon
  • [^\S\r\n]* - zero or more whitespaces other than CR and LF chars
  • (\S (?:[^\S\r\n]\S )*) - Group 2: one or more non-whitespaces, then zero or more repetitions of a whitespace other than CR and LF chars and then one or more non-whitespace chars.

CodePudding user response:

You can try with this regex:

(?<attribute>[A-Z]{3} [^:] ): *(?<value1>.*?)(?> {2,}|$)(?<value2>[^:] $)?

The groups you have are the following:

  • Group 1 attribute: will contain the attribute name
  • Group 2 value1: will contain the attribute value
  • Group 3 value2: will contain the optional attribute second value (for the fourth line)

Explanation:

  • (?<attribute>[A-Z]{3} [^:] ): Group 1
    • [A-Z]{3}: three uppercase letters
    • : a space
    • [^:] : any combination of characters other than colon
  • : *: colon any number of spaces
  • (?<value1>.*?): Group 2
    • .*?: any character (in lazy modality, so that it tries to match the least amount that can match)
  • (?> {2,}|$): Positive lookahead that matches
    • {2,}: two or more spaces (end of first inline attribute:value)
    • |: or
    • $: end of string (end of second inline attribute:value)
  • (?<value2>[^:] $)?: Group 3
    • [^:] : any combination of characters other than colon
    • $: end of string

You can call each group by their respective names.

Try it here.

  • Related