Hi I'm trying to find line start with "CGK / WIII" but just can find the the first line?
What's wrong with my text? (it is rendered from a pdf file)
I am coding with Python to extract data from pdf invoice to dataframe with invoice2data package, and face an error with one text rendered from one pdf file.
First I tried with regex: \w{3}\s\/[\s\w{4}]*
and found out that it just can find 1 line.
Then I also tried with fix text "CGK / WIII" should found 4 match. But it's NOT.
I think there are font differences in my text but not sure.
CodePudding user response:
When I turn on global - Don't return after the first match
in your linked example, it shows 4 matches.
Also you can not use quantifiers {4}
inside a character set (inside []
).
I'd do it like this:
\w{3}\s/\s\w{4}