I'm trying to extract locations from France. Here is a sample:
1#Tunisia#TS#TS#34#9#TS;4#Virsac, Aquitaine, France#FR#FR97#45.0333#-0.45#-1477568;4#Gironde, Aquitaine, France#FR#FR97#44.584#-0.089244#-1429418
It's basically a city, its region and its country. Hence, I did this:
^[2-5]#(.*?)#FR#
The result is:
Gironde, Aquitaine, France
This extracts correctly the city/region/country but it will extract only one of them. Is it possible to extract multiple entries ? The expected result would be:
Virsac, Aquitaine, France
Gironde, Aquitaine, France
Thanks in advance,
CodePudding user response:
Building off your current pattern, you need to replace the ^
anchor with a word boundary construct (to make sure the 2
, 3
, 4
, or 5
are matched as standalone numbers) and replace .*?
with [^#]*
to disallow matching rightmost occurrence of the trailing delimiter pattern.
That is, you can use
\b[2-5]#([^#]*)#FR#
See the regex demo. Details:
\b
- a word boundary[2-5]
- a digit from2
to5
#
- a#
char([^#]*)
- Group 1: zero or more chars other than#
#FR#
- a#FR#
string.