Python regex to find all strings that start with ' and end with '.tr ignoring leading and-CodePudding

I am struggling to get the correct regex for my script. I would like to find all Substrings in a file that start with a ' and end with '.tr. And save all these matches in a list.

This is what Ive got so far:

import glob
import pathlib
import re
       
libPathString = str(pathlib.Path.cwd().parent.resolve()) 

for path in glob.glob(libPathString   "/**", recursive=True):
    if(".dart" in path):
        with open(path, 'r ', encoding="utf-8") as file:
            data = [line.strip() for line in file.readlines()]
            data = ''.join(data)
            words = re.findall(r'\'.*\'.tr', data)
            print(words)

The first problem is that words is not just the matching substring but the whole file until the substring.

Also it is giving me this file:

  child: Hero(
    tag: heroTag ?? '',  // <- because of this and the line below starts with `tr`
    transitionOnUserGestures: true,
    child: Material(

But this should not match!

And then it is not finding this:

  AutoSizeText(
      'Das ist ein langer Text, der immer in einer Zeile ist.'
          .tr,
      style: AppTextStyles.montserratH4Regular,

This one should match!

What am I missing here?

CodePudding user response：

You can use

words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)

See the Python demo. Details:

'[^'\\]*(?:\\.[^'\\]*)*' - ', zero or more chars other than ' and \, and then zero or more sequences of a \ followed with any single char and any zero or more chars other than ' and \ (this will match strings between ' chars with any escaped chars in between)
\s* - zero or more whitespaces (this will match any whitespace, including line breaks)
\.tr - .tr string (note the escaped . that now matches a litera dot)
\b - word boundary.

CodePudding user response：

You can try this

\s*'(. ?)'\s*\.tr

However, it looks like your use is to extract the string to be translated from a .dart file. I think it would be more elegant to use a library that can parse the AST of the dart language for this purpose.