i have some pdfs files that i need to create folders with part of your name and move the pdfs to the folders.
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
for file in glob.glob("*.pdf"):
dst = cwd "\\" file.replace(str(padrao), '').replace('P', '')
os.mkdir(dst)
shutil.move(file, dst)
ex: I have the file P9883231_V1_A0V0_T07-54-369-664_S00001.pdf, P9883231_V1_A0V0_T07-54-369-664_S00002.pdf and P1235567_V1_A0V0_T07-54-369-664_S00001.pdf.
In this example I need the script to create two folders: 9883231 and 1234567. (the part in italics must be the name of the folder)
notice that in my code I remove the unwanted parts to create the folder, the 'P' at the beginning and part of padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
The problem is that at the end of the padrao the number can be variable, the file can end with "02.pdf" , "03.pdf"
In the example I mentioned above, the folder 9883231 should contain both files.
CodePudding user response:
Regular expressions can do the trick here:
import re
import os
import glob
import shutil
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S000'
for file in glob.glob("*.pdf"):
dst = os.path.join(cwd, re.findall("P(.*)" padrao "\d{2}.pdf", file)[0])
os.mkdir(dst)
shutil.move(file, dst)
Notice that I remove the part of padrao
that varies.
The regex matches all strings that begin ith a P
, followed by the padrao
string value, followed by 2 digits, followed by .pdf
; and takes the first occurence (no check is made wether it found anything here ...)
Also, it is better practice to use os.path.join()
to avoid issues when creating path strings (when whanging os notably)