I want to replace certain characters in file name of pdf files. My code so far:
for file in files:
file_ed = file
replace = [",","-", "The "," "]
for item in replace:
file_ed = file_ed.replace(item,"")
In addition I would like to replace dots "." in the file names. If I would include "." in the replace list though, it will also replace the dot in ".pdf" which obviously is not what I want. Any help is much appreciated.
CodePudding user response:
You can replace at once with re.sub
(regex substitution) excluding the .pdf
ending from the replacement (via string indexing):
import re
fname = '-filename_,a,-.b.c d The., f.pdf'
new_fname = re.sub(r'(,|-|\.| |The)', '', fname[:-4]) fname[-4:]
print(new_fname)
filename_abcdf.pdf
CodePudding user response:
A solution may be:
for file in files:
index = file.rfind('.') # in case there are more than 3 characters long extensions
file_ed, extension = file[:index], file[index:]
replace = [",","-", "The "," ", "."]
for item in replace:
file_ed = file_ed.replace(item,"")
file_ed = extension
print(file_ed)
CodePudding user response:
If all files have extensions, you can count the dots, and replace all but the last:
dots = file_ed.count(".")
file_ed = file_ed.replace(".", "", dots-1)
CodePudding user response:
You can use re here.
Example:
import re
string = "Hello, World. This is a| \string and here is .pdf file."
print(re.sub(r'(?!.pdf)[,|.|\||]', '', string))
Output: Hello World This is a string and here is .pdf file
Here we exclude instances of ".pdf" from the match.
CodePudding user response:
For the purpose of identifying the file extension, I would reuse functionality already available in os
or pathlib.Path
.
With os
,
import os
for file in files:
file_stem,ext = os.path.splitext(file)
replace = [",","-", "The "," ","."] # dot added
for item in replace:
file_stem = file_stem.replace(item,"")
file_ed=f'{file_stem}{ext}'
or with pathlib.Path
,
from pathlib import Path
for file in files:
file_p = Path(file)
file_stem=file_p.stem
ext=file_p.suffix
replace = [",","-", "The "," ","."] # dot added
for item in replace:
file_stem = file_stem.replace(item,"")
file_ed=f'{file_stem}{ext}'