Home > Mobile >  list comprehension with Regex to match whole item if it has matching number between two specific cha
list comprehension with Regex to match whole item if it has matching number between two specific cha

Time:12-11

This question is the continuation of this post. I have the following list :

list_paths=[imgs/foldeer/img_ABC_21389_1.tif.tif,
imgs/foldeer/img_ABC_15431_10.tif.tif,
imgs/foldeer/img_GHC_561321_2.tif.tif,
imgs_foldeer/img_BCL_871125_21.tif.tif,
...]

I want to be able to run a for loop to match string with specific number,which is the number between the third occurrence of "_" to the ".tif.tif", for example, when number is 1, the string to be matched is "imgs/foldeer/img_ABC_21389_1.tif.tif" ,

for number 2, the match string will be "imgs/foldeer/img_GHC_561321_2.tif.tif".

For that, I wanted to use regex expression using list comprehension. Based on this answer, I have tested this regex expression on Regex101:


number = 10
pattern = rf"^\S*?/(?:[^\s_/] _){{3}}{number}\.tif\b[^\s/]*$"

indices = [for x in data if re.search(pattern, x)]

But this doesn't match anything, and also doesn't make sure that it will take the exact number, so if number is 1, it might also select items with number 10 .

My end goal is to be able to match items in the list that have the request number between the 2nd occurrence of "_" to the first occirance of ".tif" , using regex expression, looking for help with the regex expression.

The output should be the whole path and not only the number.

CodePudding user response:

You can simplify your existing regex pattern a bit to use the exact matching for the ending .tif.tif

import re
data=['imgs/foldeer/img_ABC_21389_1.tif.tif',
'imgs/foldeer/img_ABC_15431_10.tif.tif',
'imgs/foldeer/img_GHC_561321_2.tif.tif',
'imgs_foldeer/img_BCL_871125_21.tif.tif']

number = 2
pattern = rf"^\S*?/(?:[^\s_/] _){{3}}{number}\.tif\.tif$"
print([x for x in data if re.search(pattern, x)])

Output:

['imgs/foldeer/img_ABC_15431_2.tif.tif']

My end goal is to be able to match items in the list that have the request number between the 2nd occurrence of "_" to the first occirance of ".tif" , using regex expression, looking for help with the regex expression.

number = 1
pattern = rf"^\S*?/(?:[^\s_/] _){{3}}{number}\.tif\.tif$"
print([x for x in data if re.search(pattern, x)])

Output:

['imgs/foldeer/img_ABC_21389_1.tif.tif']

As you can see, when number is 1, only the pattern with 1 is matched(even though we have a pattern having 10 in the data) with output being - ['imgs/foldeer/img_ABC_21389_1.tif.tif']

  • Related