I tried to match the pattern of a file in my folders the file extension is a pdf.
I have many pdf files that have the same pattern but with different name at the end.
the pattern includes date name of the file.
The problem is that when I run the script the system consider the both file name as the first pattern (python_pt
) and do not go for the elif
statement.
Example:
10-11-2021 python.pdf
22-09-2021 java.pdf
Code:
import re
import os
from os import path
from tqdm import tqdm
from time import sleep
python_pt= "^[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$ python.pdf"
java_pt1= "^[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$ java.pdf"
java_pt2= "^ java [0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$.pdf"
str = 'c:'
a = 0
i = 0
for dirpath, dirnames, files in os.walk(src, topdown=True):
print(f'\nFound directory: {dirpath}\n')
for file in tqdm(files):
sleep(.1)
full_file_name = os.path.join(dirpath, file)
if os.path.join(dirpath) == src:
if file.endswith("pdf"):
if python_pt:
i =1
elif java_pt1 or java_pt2:
a =1
print("{} file 1 \n".format(i))
print("{} file 2 \n".format(a))
CodePudding user response:
The problems are with your regular expressions and the way you perform a regex check:
- The anchors must not be used randomly inside the pattern;
$
renders the pattern invalid once you use it in the middle (there can be no chars after end of string). As you need to check if file names end with your pattern, add$
at the end only, and do not forget to escape literal.
- To check if there is a match you need to use one of the
re.search
/re.match
/re.fullmatch
methods.
Here is a fixed snippet:
import re, os
from os import path
from tqdm import tqdm
from time import sleep
python_pt= r"[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2} python\.pdf$" # FIXED
java_pt1= r"[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2} java\.pdf$" # FIXED
java_pt2= r"java [0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}\.pdf$" # FIXED
src = "C:"
i=0
a=0
for dirpath, dirnames, files in os.walk(src, topdown=True):
print(f'\nFound directory: {dirpath}\n')
for file in tqdm(files):
sleep(.1)
full_file_name = os.path.join(dirpath, file)
if os.path.join(dirpath) == src:
if file.endswith("pdf"):
if re.search(python_pt, file): # FIXED
i =1
elif re.search(java_pt1, file) or re.search(java_pt2, file): # FIXED
a =1
print("{} file 1 \n".format(i))
print("{} file 2 \n".format(a))
See the # FIXED
lines.