Home > OS >  Python regex - updating file by adding character for all the matched pattern
Python regex - updating file by adding character for all the matched pattern

Time:04-11

i am trying to create a script to:

  1. read all the text files in my folder
  2. find the words that matched the pattern [r'\d\d\d\d' "H"] (eg. 1234H)
  3. replace them into (eg. 12:34:00)
  4. save file

currently my code is this, not sure where went wrong. pls advise thank you!


import os
import re

path = r'C:\Users\CL\Desktop\regex'

for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith('.txt'): #find all .txt files
            path = os.path.join(root, file)
            f = open(path,'a')
            pattern = r'\d\d\d\d' "H" #pattern
            replacewords = re.findall(pattern, f) #find all words with this pattern
            
            ...... #replace matched words with eg. 12:23:00
            
            f.write() #save file
            f.close()

sample text content:

1111H, 1234H, 1115H

CodePudding user response:

You can use

import os, re

path = r'C:\Users\CL\Desktop\regex'

for root, dirs, files in os.walk(path):
    for file in files:
        if file.lower().endswith('.txt'): #find all .txt / .TXT files
            path = os.path.join(root, file)
            pattern = r'(\d{2})(\d{2})H' # pattern
            with open(path, 'r ') as f:  # Read and update
                contents = re.sub(pattern, r'\1:\2:00' f.read())
                f.seek(0)
                f.truncate()
                f.write(contents)
            

NOTE:

  • if file.lower().endswith('.txt') makes text file search case insensitive
  • (\d{2})(\d{2})H pattern matches and captures the first two digits in Group 1 and the next two digits before H into Group 2
  • When replacing, \1 refers to Group 1 value and \2 refers to Group 2 value
  • The file read mode is set to r so that the file could be both read and updated.
  • The f.seek(0) and f.truncate() allow re-writing the file contents with the updated contents.
  • Related