I would like to extract the following patterns:
Initial by letter (Subgroup 1); and then
followed by numbers of any length (Subgroup 2);
followed by letter or any length (Subgroup 3);
repeating 2 & 3 of any occurrences.
I am using https://regexr.com/ to test.
Here are some samples string and my expected output.
String: FAF46ABC7787AAAA => Desired output: FAF46ABC7787
String: FAF46ABC7787 => Desired output: FAF46ABC
String: FAF46ABC => Desired output: FAF46
String: FAF46 => Desired output: FAF
String: FAF => Desired output: FAF
String: FAF46 GG(Not CC) => Desired output: FAF
String: FAF46.doc => Desired output: FAF
I tested the following but not working:
- Lookahead method suggested by
Python regex matching all but last occurrence
1a. ^([a-zA-Z] )([0-9]*[a-zA-Z]*)(?=[0-9] |[a-zA-Z] )
1b. ^([a-zA-Z] )(([0-9])*([a-zA-Z])*)(?=[0-9] |[a-zA-Z] )
- Capture all subgroups and exclude last occurrence by loop
2a. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*
- Using replace method
3a. (^(?:[a-zA-Z] [0-9]*)(?:[a-zA-Z] [0-9]*)*)([a-zA-Z] |[0-9] )
and replace by $1
- Exclude ending occurrence by using non-capturing group
4a. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:[0-9] |[a-zA-Z] )$
4b. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:([0-9] |[a-zA-Z] ))$
4c. ^([a-zA-Z] )(([0-9]*)([a-zA-Z]*))*(?:[0-9a-zA-Z] )$
4d. ^([a-zA-Z] )(([0-9]*?)([a-zA-Z]*?))*(?:[0-9a-zA-Z] )$
I also change greedy or lazy to see if any miracles happen. But no luck.
I thought it should be easy task. But it is obvious that it is harder than what I thought.
I would appreciate for any kind of help.
Please note that I do not have extended regex if it is the case to work it work. Thank you.
CodePudding user response:
You can search using this regex:
^([a-zA-Z] [0-9a-zA-Z]*?)(?:[0-9] |[A-Z]*)\b.*
and replace with $1
RegEx Details:
^
: Start(
: Start capture group #1[a-zA-Z]
: Match 1 letters[0-9a-zA-Z]*?
: Match 0 or more letter or digits (non-greedy)
)
: End 1st capture group(?:
: Start non-capture group[0-9]
: Match 1 digits|
: OR[A-Z]*
: Match 0 or more uppercase letters
)
: End non-capture group\b
: Word boundary.*
: Match anything remaining