Home > Blockchain >  Regex match repeating patterns but not last occurrence
Regex match repeating patterns but not last occurrence

Time:06-22

I would like to extract the following patterns:

  1. Initial by letter (Subgroup 1); and then

  2. followed by numbers of any length (Subgroup 2);

  3. followed by letter or any length (Subgroup 3);

  4. repeating 2 & 3 of any occurrences.

I am using https://regexr.com/ to test.

Here are some samples string and my expected output.

String: FAF46ABC7787AAAA  =>   Desired output: FAF46ABC7787

String: FAF46ABC7787      =>   Desired output: FAF46ABC

String: FAF46ABC          =>   Desired output: FAF46

String: FAF46             =>   Desired output: FAF

String: FAF               =>   Desired output: FAF

String: FAF46 GG(Not CC)  =>   Desired output: FAF

String: FAF46.doc         =>   Desired output: FAF

I tested the following but not working:

  1. Lookahead method suggested by

Python regex matching all but last occurrence

1a. ^([a-zA-Z] )([0-9]*[a-zA-Z]*)(?=[0-9] |[a-zA-Z] )

1b. ^([a-zA-Z] )(([0-9])*([a-zA-Z])*)(?=[0-9] |[a-zA-Z] )

  1. Capture all subgroups and exclude last occurrence by loop

2a. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*

  1. Using replace method

3a. (^(?:[a-zA-Z] [0-9]*)(?:[a-zA-Z] [0-9]*)*)([a-zA-Z] |[0-9] ) and replace by $1

  1. Exclude ending occurrence by using non-capturing group

4a. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:[0-9] |[a-zA-Z] )$

4b. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:([0-9] |[a-zA-Z] ))$

4c. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:[0-9a-zA-Z] )$

4d. ^([a-zA-Z] )(([0-9]*?)([a-zA-Z]*?))*(?:[0-9a-zA-Z] )$

I also change greedy or lazy to see if any miracles happen. But no luck.

I thought it should be easy task. But it is obvious that it is harder than what I thought.

I would appreciate for any kind of help.

Please note that I do not have extended regex if it is the case to work it work. Thank you.

CodePudding user response:

You can search using this regex:

^([a-zA-Z] [0-9a-zA-Z]*?)(?:[0-9] |[A-Z]*)\b.*

and replace with $1

RegEx Demo

RegEx Details:

  • ^: Start
  • (: Start capture group #1
    • [a-zA-Z] : Match 1 letters
    • [0-9a-zA-Z]*?: Match 0 or more letter or digits (non-greedy)
  • ): End 1st capture group
  • (?:: Start non-capture group
    • [0-9] : Match 1 digits
    • |: OR
    • [A-Z]*: Match 0 or more uppercase letters
  • ): End non-capture group
  • \b: Word boundary
  • .*: Match anything remaining
  • Related