Home > Back-end >  Regex python - find match items on list that have the same digit between the second character "
Regex python - find match items on list that have the same digit between the second character "

Time:12-09

I have the following list :

imgs/foldeer/img_ABC_21389_1.tif.tif
imgs/foldeer/img_ABC_15431_10.tif.tif
imgs/foldeer/img_GHC_561321_2.tif.tif
imgs_foldeer/img_BCL_871125_21.tif.tif
...

I want to be able to run a for loop to match string with specific number,which is the number between the second occurance of "_" to the ".tif.tif", for example, when number is 1, the string to be matched is "imgs/foldeer/img_ABC_21389_1.tif.tif" , for number 2, the match string will be "imgs/foldeer/img_GHC_561321_2.tif.tif".

For that, I wanted to use regex expression. Based on this answer, I have tested this regex expression on Regex101:

[^\r\n_] \.[^\r\n_] \_([0-9])

But this doesn't match anything, and also doesn't make sure that it will take the exact number, so if number is 1, it might also select items with number 10 .

My end goal is to be able to match items in the list that have the request number between the 2nd occurrence of "_" to the first occirance of ".tif" , using regex expression, looking for help with the regex expression.

CodePudding user response:

[0-9]*(?=\.tif\.tif)

This regex expression uses a lookahead to capture the last set of numbers (what you're looking for)

CodePudding user response:

I'll show you something working and equally ugly as regex which I hate:

data = ["imgs/foldeer/img_ABC_21389_1.tif.tif",
"imgs/foldeer/img_ABC_21389_1.tif.tif",
"imgs/foldeer/img_ABC_15431_10.tif.tif",
"imgs/foldeer/img_GHC_561321_2.tif.tif",
"imgs_foldeer/img_BCL_871125_21.tif.tif"]

numbers = [int(x.split("_",3)[-1].split(".")[0]) for x in data]
  • First split gives ".tif.tif"
  • extract the last element
  • split again by the dot this time, take the first element (thats your number as a string), cast it to int

Please keep in mind it's gonna work only for the format you provided, no flexibility at all in this solution (on the other hand regex doesn't give any neither)

CodePudding user response:

without regex if allowed.

import re
s= 'imgs/foldeer/img_ABC_15431_10.tif.tif'
last =s[s.rindex('_') 1:]
print(re.findall(r'\d ', last)[0])

Gives #

10

CodePudding user response:

Try this:

import re

s = '''imgs/foldeer/img_ABC_21389_1.tif.tif
imgs/foldeer/img_ABC_15431_10.tif.tif
imgs/foldeer/img_GHC_561321_2.tif.tif
imgs_foldeer/img_BCL_871125_21.tif.tif'''



number = 1
res1 = re.findall(f".*_{number}\.tif.*", s)

number = 21
res21 = re.findall(f".*_{number}\.tif.*", s)


print(res1)
print(res21)

Results

['imgs/foldeer/img_ABC_21389_1.tif.tif']
['imgs_foldeer/img_BCL_871125_21.tif.tif']

CodePudding user response:

In your question, the numbers that you are referring to are after the 3rd occurrence of the _

Using a capture group:

/(?:[^\s_/] _){3}(\d )\.tif\b[^\s/]*$

The pattern matches:

  • / Match literally
  • (?:[^\s_/] _){3} Match 3 times (non consecutive) _
  • (\d ) Capture group 1, match 1 digits
  • \.tif\b[^\s/]* Match .tif followed by any char except /
  • $ End of string

Regex demo

Example using re.findall to return the capture group 1 values:

import re

pattern = r"/(?:[^\s_/] _){3}(\d )\.tif\b[^\s/]*$"

s = ("imgs/foldeer/img_ABC_21389_1.tif.tif\n"
            "imgs/foldeer/img_ABC_15431_10.tif.tif\n"
            "imgs/foldeer/img_GHC_561321_2.tif.tif\n"
            "imgs_foldeer/img_BCL_871125_21.tif.tif")

print(re.findall(pattern, s, re.M))

Output

['1', '10', '2', '21']
  • Related