I am struggling to get the correct regex
for my script. I would like to find all Substrings
in a file
that start with a '
and end with '.tr
. And save all these matches in a list.
This is what Ive got so far:
import glob
import pathlib
import re
libPathString = str(pathlib.Path.cwd().parent.resolve())
for path in glob.glob(libPathString "/**", recursive=True):
if(".dart" in path):
with open(path, 'r ', encoding="utf-8") as file:
data = [line.strip() for line in file.readlines()]
data = ''.join(data)
words = re.findall(r'\'.*\'.tr', data)
print(words)
The first problem is that words
is not just the matching substring but the whole file until the substring.
Also it is giving me this file:
child: Hero(
tag: heroTag ?? '', // <- because of this and the line below starts with `tr`
transitionOnUserGestures: true,
child: Material(
But this should not match!
And then it is not finding this:
AutoSizeText(
'Das ist ein langer Text, der immer in einer Zeile ist.'
.tr,
style: AppTextStyles.montserratH4Regular,
This one should match!
What am I missing here?
CodePudding user response:
You can use
words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)
See the Python demo. Details:
'[^'\\]*(?:\\.[^'\\]*)*'
-'
, zero or more chars other than'
and\
, and then zero or more sequences of a\
followed with any single char and any zero or more chars other than'
and\
(this will match strings between'
chars with any escaped chars in between)\s*
- zero or more whitespaces (this will match any whitespace, including line breaks)\.tr
-.tr
string (note the escaped.
that now matches a litera dot)\b
- word boundary.
CodePudding user response:
You can try this
\s*'(. ?)'\s*\.tr
However, it looks like your use is to extract the string to be translated from a .dart
file. I think it would be more elegant to use a library that can parse the AST of the dart language for this purpose.