I'm trying to parse some attributes from a modem's AT output. My regex is as follow:
([^:]*):\s*([^\s]*)
Sample output as follow:
LTE SSC1 bw : 20 MHz LTE SSC1 chan: 2850
LTE SSC2 state:INACTIVE LTE SSC2 band: B20
LTE SSC2 bw : 10 MHz LTE SSC2 chan: 6300
EMM state: Registered Normal Service
RRC state: RRC Connected
IMS reg state: NOT REGISTERED IMS mode: Normal
This mostly works ok but not so well where an attribute's value has more characters after the first whitespace. For example, the match "LTE SSC2 bw" has a group 2 value of "10" when it should be "10 MHz".
Ideally I need the regex to match exactly the attributes, and group the value for it.
Hope this makes sense and thanks for your help.
CodePudding user response:
If there is always at least two spaces between the key-value pairs you can use
([^:\s][^:]*):[^\S\r\n]*(\S (?:[^\S\r\n]\S )*)
See the regex demo.
Details:
([^:\s][^:]*)
- Group 1: a char other than whitespace and:
and then zero or more non-:
chars:
- a colon[^\S\r\n]*
- zero or more whitespaces other than CR and LF chars(\S (?:[^\S\r\n]\S )*)
- Group 2: one or more non-whitespaces, then zero or more repetitions of a whitespace other than CR and LF chars and then one or more non-whitespace chars.
CodePudding user response:
You can try with this regex:
(?<attribute>[A-Z]{3} [^:] ): *(?<value1>.*?)(?> {2,}|$)(?<value2>[^:] $)?
The groups you have are the following:
- Group 1 attribute: will contain the attribute name
- Group 2 value1: will contain the attribute value
- Group 3 value2: will contain the optional attribute second value (for the fourth line)
Explanation:
(?<attribute>[A-Z]{3} [^:] )
: Group 1[A-Z]{3}
: three uppercase letters[^:]
: any combination of characters other than colon
: *
: colon any number of spaces(?<value1>.*?)
: Group 2.*?
: any character (in lazy modality, so that it tries to match the least amount that can match)
(?> {2,}|$)
: Positive lookahead that matches{2,}
: two or more spaces (end of first inline attribute:value)|
: or$
: end of string (end of second inline attribute:value)
(?<value2>[^:] $)?
: Group 3[^:]
: any combination of characters other than colon$
: end of string
You can call each group by their respective names.
Try it here.