Home > Software engineering >  Regex capture line with multiple spaces
Regex capture line with multiple spaces

Time:11-18

I'm trying to capture the 2 lines below from beginning to before the AJ sign.

TSA01-19AUG80/F/LEE/ANGIE/JEAN AJ 17NOV 2124Z
TSA01-19AUG80/F/LEE/ANGIE/JEAN MARIE AJ 17NOV 2124Z

The end of the line (AJ 17NOV 2124Z) is not constant and may be different every time.

I was able to capture this line by using this format - TSA01-([^\s] )

TSA01-19AUG80/F/LEE/ANGIE/JEAN AJ 17NOV 2124Z

But I'm stuck on if someone has an extra space in their first name, like below. How do I capture the 2nd name without capturing the AJ 17NOV 2124Z?

TSA01-19AUG80/F/LEE/ANGIE/JEAN MARIE AJ 17NOV 2124Z

CodePudding user response:

Try:

.*(?=(?:\s \S ){3}$)

Regex demo.


This will match all string until last three words at the end.

CodePudding user response:

Perhaps it would suffice to capture before the date like pattern 17NOV consisting of 1 or 2 digits and 1 or more characters A-Z:

^TSA01-.*?(?=\s \d{1,2}[A-Z] \b)

Explanation

  • ^ Start of string
  • TSA01- Match literally
  • .*? Match any character except a newline, as few as possible
  • (?= Positive lookahead, assert that to the right is:
    • \s \d{1,2}[A-Z] \b Match 1 whitespace chars, 1-2 digits followed by 1 chars A-Z
  • ) Close the lookahead

See a regex demo,

  • Related