Home > Back-end >  Python Regex - Matching tokens in different lines of a file
Python Regex - Matching tokens in different lines of a file

Time:09-02

In a file, I have the following lines:

[Line 1] My Name is Adam;
[Line 2] <Blank Line>
[Line 3] My Name 
[Line 4] is Adam Lee;
[Line 5] <Blank Line>
[Line 6] My
[Line 7] Name
[Line 8] is
[Line 9] Adam
[Line 10] Lee;

My tokens are: 'My' 'Name' 'Adam' and I know that they would end with ';'

Here is how I have written my code in Python:

#Read the input file
try:
    file_path = sys.argv[1]
    content = "".join(open(file_path))
    my_file = open(file_path).read()
except Exception as err:
    print("Exception caught while opening the file!")
    print(repr(err))
    exit()

# Find matches 
my_regex = r"^[ ]*My\s Name.*Adam.*[;/]"
matches = re.findall(my_regex, my_file, flags=re.IGNORECASE   re.MULTILINE)

Observation: Only Line 1 is getting matched. My expectation is Line 3-4 and Line 6-10 also get matched since the tokens and the delimiter ticks the boxes. How can I modify my regex? Please help.

CodePudding user response:

You might write the pattern using a negated character class matching any char except a semicolon:

^ *My\s Name[^;]*Adam[^;]*;
  • ^ Start of string
  • * Match optional spaces
  • My\s Name Match My Name with 1 whitespace chars in between
  • [^;]*Adam[^;]* Match Adam between optional chars other than ;
  • ; Match the ; at the end of the string

Regex demo

  • Related