I am a RegEx beginner and trying to identify the endings of different statements in sms. See screenshot below.
How can I avoid selecting the next letter following by a full-stop that indicates ending of a statement.
Note that some statements have <.><Alphabets>
while some have <.><space><Alphabets>
Regex used: r"\. ?[\D]"
Sample SMS: - I want to select just the full-stop and space if any.
Txn of USD 00.00 done using TC XX at POS*MERCH on 30-Feb-22. Avl bal:USD 00.00. Call xxxxxx for dispute or SMS BLOCK xxxx to xxxxxxx
Acct XX debited with USD XX.00 on some date.Info: ABC*BDECS-XYZ.Avbl Bal:USD yy,xxx.95.Call xxxxxx for dispute or SMS BLOCK xx to xxxxx
screenshot from RegExr on regular pattern
CodePudding user response:
What you're looking for is a look-ahead group. Whether you make that a positive look-ahead and use the negated character set \D
or a negative look-ahead with the character set \d
doesn't really matter- I'll outline both below:
regex = r". ?(?=\D)" # asserts that the following character matches \D
regex = r". ?(?!\d)" # asserts the following character does NOT match \d
There's also look-behind variants (?<!pattern)
and (?<=pattern)
, which assert that the pattern doesn't/does match just before the current position.
None of these groups capture the matched text- they just "look ahead" or "look behind" without changing state.