Home > Mobile >  Regex matching multiple entries
Regex matching multiple entries

Time:09-25

I'm trying to extract locations from France. Here is a sample:

1#Tunisia#TS#TS#34#9#TS;4#Virsac, Aquitaine, France#FR#FR97#45.0333#-0.45#-1477568;4#Gironde, Aquitaine, France#FR#FR97#44.584#-0.089244#-1429418

It's basically a city, its region and its country. Hence, I did this:

^[2-5]#(.*?)#FR#

The result is:

Gironde, Aquitaine, France

This extracts correctly the city/region/country but it will extract only one of them. Is it possible to extract multiple entries ? The expected result would be:

Virsac, Aquitaine, France
Gironde, Aquitaine, France

Thanks in advance,

CodePudding user response:

Building off your current pattern, you need to replace the ^ anchor with a word boundary construct (to make sure the 2, 3, 4, or 5 are matched as standalone numbers) and replace .*? with [^#]* to disallow matching rightmost occurrence of the trailing delimiter pattern.

That is, you can use

\b[2-5]#([^#]*)#FR#

See the regex demo. Details:

  • \b - a word boundary
  • [2-5] - a digit from 2 to 5
  • # - a # char
  • ([^#]*) - Group 1: zero or more chars other than #
  • #FR# - a #FR# string.
  • Related