Home > Mobile >  Python: Check if strings in list match very specific conditions
Python: Check if strings in list match very specific conditions

Time:11-28

I have a list of file name strings similar to this (but very long):

list = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']

I am looking to make another list out of this one by taking only the strings that match 4 conditions:

  1. Start with substring "AB"
  2. End with substring ".txt"
  3. Between "AB" and ".txt" there must be any 7 digit number
  4. There are no other substrings in the string (i.e only the 3 items above can be in the string)

Therefore in this case the desired result would be this list:

list2 = ['AB8372943.txt', 'AB1234567.txt']

So far I know that to check for a 7 digit number I can use:

list2 = [i for i in list if re.findall(r"\d{7}", i)]

And how to look for substrings within the strings... But it isn't enough for the strings to just contain the substrings, they need to start and end with a specific one and have a 7 digit number in the middle and that's it! Is there a way to do this???

Thank you so much in advance!

CodePudding user response:

To also make sure it starts with AB and ends with .txt:

my_list = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']
my_list2 = [i for i in my_list if re.findall(r"^AB\d{7}.txt$", i)]

CodePudding user response:

You should avoid using a built in name like list. Also, if the string does not contain sub strings, you can use re.match which will start the match from the start of the string.

AB\d{7}\.txt\Z

The pattern matches:

  • AB\d{7} Match AB and 7 digits
  • \.txt Match .txt and note to escape the dot
  • \Z End of string

For example

import re

lst = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']
lst2 = [s for s in lst if re.match(r"AB\d{7}\.txt\Z", s)]
print(lst2)

Output

['AB8372943.txt', 'AB1234567.txt']

See a Python demo

  • Related