Home > Blockchain >  Apply negative lookbehind to the entire group before it
Apply negative lookbehind to the entire group before it

Time:04-30

I want to capture the model of a phone but not the storage in the title. So I don't want the regex to match xxxGB.

I am expecting to match:
iphone 13 from: "iphone 13 256gb - midnight"
iphone 13 pro max from "iphone 13 pro max 256gb - sierra blue"
iphone 13 pro from "iphone 13 pro 128gb - graphite"
galaxy tab a8 from "galaxy tab a8 wifi 128gb - grey"

The regular expression I have is

r'[A-Za-z] \s?[A-Za-z\ \.\d]*((\spro|\smax|\slight|\smini|\splus|\sultra|\[A-Za-z]?\d (?!gb)))*|$'

but the look behind only applied to the last number before "gb" not the entire number after the space

apple iphone 13 256gb - midnight
<re.Match object; span=(6, 18), match='iphone 13 25'>
<re.Match object; span=(32, 32), match=''>
apple iphone 13 pro 128gb - graphite
<re.Match object; span=(6, 22), match='iphone 13 pro 12'>
<re.Match object; span=(36, 36), match=''>
apple iphone 13 pro max 256gb - sierra blue
<re.Match object; span=(6, 26), match='iphone 13 pro max 25'>
<re.Match object; span=(43, 43), match=''>
samsung galaxy tab a8 wifi 128gb - grey
<re.Match object; span=(8, 21), match='galaxy tab a8'>
<re.Match object; span=(39, 39), match=''>

The testing template can be found from here: https://regex101.com/r/dn0Hyr/1

Many thanks!!

CodePudding user response:

You may use this regex to match phone models:

^[A-Za-z] (?: (?!wifi|\d*gb)[\dA-Za-z] )*

RegEx Demo

RegEx Details:

  • ^: Start
  • [A-Za-z] : Match 1 letters
  • (?: (?!wifi|\d*gb)[\dA-Za-z] )*: Delimited by space match 1 of letters or digits as long as word is not wifi or digits followed by gb. Repeat this group 0 or more times

CodePudding user response:

An alternation between two positive look ahead:

/^.*(?=\swifi\s\d{3})|^.*(?=\s\d{3})/gm

RegEx101

Segment Meaning
^.* Starting with anything BUT a newline occurring zero or more times...
(?=\swifi\s\d{3}) ...is a match if it is before a space, literal "wifi", a space, and 3 digits...
| OR
^.* ...starting with anything BUT a newline occurring zero or more times...
(?=\s\d{3}) ...is a match if it is before a space and 3 digits.
  • Related