Home > Software design >  How to extract string between space and symbol '>'?
How to extract string between space and symbol '>'?

Time:09-16

String 1:

 [impro:0,grp:00,time:0xac,magic:0x00ac] CAR<7:5>|BIKE<4:0>,orig:0x8c,new:0x97

String 2:

[impro:0,grp:00,time:0xbc,magic:0x00bc] CAKE<4:0>,orig:0x0d,new:0x17

In string 1, I want to extract CAR<7:5 and BIKE<4:0,

In string 2, I want to extract CAKE<4:0

Any regex for this in Python?

CodePudding user response:

You can use \w <[^>]

enter image description here

  • \w matches any word character (equivalent to [a-zA-Z0-9_])
  • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy).
  • < matches the character <
  • [^>] Match a single character not present in the list
  • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)

CodePudding user response:

We can use re.findall here with the pattern (\w .*?)>:

inp = ["[impro:0,grp:00,time:0xac,magic:0x00ac] CAR<7:5>|BIKE<4:0>,orig:0x8c,new:0x97", "[impro:0,grp:00,time:0xbc,magic:0x00bc] CAKE<4:0>,orig:0x0d,new:0x17"]
for i in inp:
    matches = re.findall(r'(\w <.*?)>', i)
    print(matches)

This prints:

['CAR<7:5', 'BIKE<4:0']
['CAKE<4:0']

CodePudding user response:

In the first example, the BIKE part has no leading space but a pipe char.

A bit more precise match might be asserting either a space or pipe to the left, and match the digits separated by a colon and assert the > to the right.

(?<=[ |])[A-Z] <\d :\d (?=>)

In parts, the pattern matches:

  • (?<=[ |]) Positive lookbehind, assert either a space or a pipe directly to the left
  • [A-Z] Match 1 chars A-Z
  • <\d :\d Match < and 1 digits betqeen :
  • (?=>) Positive lookahead, assert > directly to the right

Regex demo

Or the capture group variant:

(?:[ |])([A-Z] <\d :\d)>

Regex demo

  • Related