I have a list of file name strings similar to this (but very long):
list = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']
I am looking to make another list out of this one by taking only the strings that match 4 conditions:
- Start with substring "AB"
- End with substring ".txt"
- Between "AB" and ".txt" there must be any 7 digit number
- There are no other substrings in the string (i.e only the 3 items above can be in the string)
Therefore in this case the desired result would be this list:
list2 = ['AB8372943.txt', 'AB1234567.txt']
So far I know that to check for a 7 digit number I can use:
list2 = [i for i in list if re.findall(r"\d{7}", i)]
And how to look for substrings within the strings... But it isn't enough for the strings to just contain the substrings, they need to start and end with a specific one and have a 7 digit number in the middle and that's it! Is there a way to do this???
Thank you so much in advance!
CodePudding user response:
To also make sure it starts with AB
and ends with .txt
:
my_list = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']
my_list2 = [i for i in my_list if re.findall(r"^AB\d{7}.txt$", i)]
CodePudding user response:
You should avoid using a built in name like list
. Also, if the string does not contain sub strings, you can use re.match
which will start the match from the start of the string.
AB\d{7}\.txt\Z
The pattern matches:
AB\d{7}
Match AB and 7 digits\.txt
Match.txt
and note to escape the dot\Z
End of string
For example
import re
lst = ['AB8372943.txt', 'test.pdf', '123485940.docx', 'CW2839502.txt', 'AB1234567.txt', '283AB.txt']
lst2 = [s for s in lst if re.match(r"AB\d{7}\.txt\Z", s)]
print(lst2)
Output
['AB8372943.txt', 'AB1234567.txt']
See a Python demo