Home > Software design >  Replace substring with exception
Replace substring with exception

Time:12-26

I want to replace certain characters in file name of pdf files. My code so far:

for file in files:
    file_ed = file
    replace = [",","-", "The "," "]
    for item in replace:
        file_ed = file_ed.replace(item,"")

In addition I would like to replace dots "." in the file names. If I would include "." in the replace list though, it will also replace the dot in ".pdf" which obviously is not what I want. Any help is much appreciated.

CodePudding user response:

You can replace at once with re.sub (regex substitution) excluding the .pdf ending from the replacement (via string indexing):

import re

fname = '-filename_,a,-.b.c d The., f.pdf'
new_fname = re.sub(r'(,|-|\.| |The)', '', fname[:-4])   fname[-4:]
print(new_fname)

filename_abcdf.pdf

CodePudding user response:

A solution may be:

for file in files:
        index = file.rfind('.') # in case there are more than 3 characters long extensions
        file_ed, extension = file[:index], file[index:]
    
        replace = [",","-", "The "," ", "."]
        for item in replace:
            file_ed = file_ed.replace(item,"")
        file_ed  = extension
        print(file_ed)

CodePudding user response:

If all files have extensions, you can count the dots, and replace all but the last:

dots = file_ed.count(".")
file_ed = file_ed.replace(".", "", dots-1)

CodePudding user response:

You can use re here.

Example:

import re

string = "Hello, World. This is a| \string and here is .pdf file."

print(re.sub(r'(?!.pdf)[,|.|\||]', '', string))

Output: Hello World This is a string and here is .pdf file

Here we exclude instances of ".pdf" from the match.

CodePudding user response:

For the purpose of identifying the file extension, I would reuse functionality already available in os or pathlib.Path.

With os,

import os
for file in files:
    file_stem,ext = os.path.splitext(file)
    replace = [",","-", "The "," ","."] # dot added
    for item in replace:
        file_stem = file_stem.replace(item,"")
    file_ed=f'{file_stem}{ext}'

or with pathlib.Path,

from pathlib import Path
for file in files:
    file_p = Path(file)
    file_stem=file_p.stem
    ext=file_p.suffix
    replace = [",","-", "The "," ","."] # dot added
    for item in replace:
        file_stem = file_stem.replace(item,"")
    file_ed=f'{file_stem}{ext}'
  • Related