Home > OS >  Replace character in file name with regex python
Replace character in file name with regex python

Time:11-10

My script should replace the "|" character of a file it finds via regex in a directory with an "l".

The code runs but filenames are not replaced. What is wrong?

#!/usr/bin/python

import os
from posixpath import dirname
import re
import glob
import fnmatch

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m' 

#Path
file_src = dirname(os.path.abspath(__file__))

#Current directory name
print(bcolors.OKBLUE   bcolors.BOLD   'Directory:', file_src)
'\n'

#List all files in directory
list_file = os.listdir(file_src)
print(bcolors.BOLD   'In this directory:', '\n', list_file)
'\n'

#Finding all the "|" characters in a string
file_pattern = re.compile('[\\":<>;|*?]*')


#Replace "|" with "l"
list = str(list_file)
re.sub(file_pattern, 'l', list, re.I)

CodePudding user response:

There are a few problems with your example:

list = str(list_file)

This line is

  • shadowing a reserved keyword in Python (don't name a variable list),
  • I don't think it's doing what you think it's doing. It's not giving you a list of strings. It's giving you a string-representation of list_file, and
  • your list_file is already a list of strings. I suspect you wrote this so that your re.sub function call would operate on a single thing, but you're better to use a list comprehension

On to the next line:

re.sub(file_pattern, 'l', list, re.I)

You'll need to perform that .sub for each str in your list_, and assign the result to a variable:

replaced_list_file = [re.sub(file_pattern, 'l', f, re.I) for f in list_file]

but as multiple commenters have said, is that compile pattern actually doing what you think it's doing? Have a look at this link and see if the results are what you expect.

CodePudding user response:

Joshua's answer and the many comments, especially the suggestions from ekhumoro, already pointed out issues and guided to the solution.

Fixed and improved

Here is my copy-paste ready code, with some highlighting inline comments:

#!/usr/bin/python

import os
from posixpath import dirname
import re
import glob
import fnmatch

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m' 
    RESET = '\u001b[0m' # added to get regular style

def print_list(files):
    '''Print a list, one element per line.'''
    for f in files:
        print(bcolors.OKBLUE   f   bcolors.RESET)

#Path
directory = dirname(os.path.abspath(__file__))

#Current directory name
print(bcolors.BOLD   'Directory:'   bcolors.OKBLUE, directory)
print(bcolors.RESET)

#List all files in directory
files = os.listdir(directory)
print(bcolors.BOLD   'In this directory:'   bcolors.OKBLUE, len(files), bcolors.RESET   'files')
print_list(files)

#Finding all the "|" characters in a string
pipe_pattern = re.compile('\|')  # need to escape the special character pipe (in regex means logical-OR)


#Replace "|" with "l"
renamed_files = []
for f in files:
    f_renamed = re.sub(r'\|', 'l', f, re.I)
    if (str(f_renamed) != str(f)):
        renamed_files.append(f_renamed)

# print the list of filenames, each on a separate line
print(bcolors.BOLD, "Renamed:"   bcolors.OKGREEN, len(renamed_files), bcolors.RESET   "files")
print_list(renamed_files)

Explanation

  • A simple regex to match a pipe-character is \|
  • Note: prepended backslash is required to escape special characters (like | (or), \ escape, ( and ) grouping etc.)
  • Sometimes it is useful to extract code-blocks to functions (e.g. the def print_list) . These can be easily tested.

Test your replacement

To test your replacement a simple function would help. Then you can test it with a fictive example.

def replace_pipe(file):
    return file.replace('|', 'l') # here the first argument is no regex, thus not escaped!

### Test it with an example first
print( replace_pipe('my|file.txt') )

If everything works like expected. Then you add further (critical) steps.

Avoid integrating the I/O layer to early

To elaborate on the important advice from ekhumoro: The os.rename is a file-system operation at I/O layer. It has immediate effect on your system and can not easily be undone.

So it can be regarded as critical. Imagine your renaming does not work as expected. Then all the files can be renamed to a cryptic mess, at worst (harmful like ransomware).

  • Related