i am trying to create a script to:
- read all the text files in my folder
- find the words that matched the pattern [r'\d\d\d\d' "H"] (eg. 1234H)
- replace them into (eg. 12:34:00)
- save file
currently my code is this, not sure where went wrong. pls advise thank you!
import os
import re
path = r'C:\Users\CL\Desktop\regex'
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.txt'): #find all .txt files
path = os.path.join(root, file)
f = open(path,'a')
pattern = r'\d\d\d\d' "H" #pattern
replacewords = re.findall(pattern, f) #find all words with this pattern
...... #replace matched words with eg. 12:23:00
f.write() #save file
f.close()
sample text content:
1111H, 1234H, 1115H
CodePudding user response:
You can use
import os, re
path = r'C:\Users\CL\Desktop\regex'
for root, dirs, files in os.walk(path):
for file in files:
if file.lower().endswith('.txt'): #find all .txt / .TXT files
path = os.path.join(root, file)
pattern = r'(\d{2})(\d{2})H' # pattern
with open(path, 'r ') as f: # Read and update
contents = re.sub(pattern, r'\1:\2:00' f.read())
f.seek(0)
f.truncate()
f.write(contents)
NOTE:
if file.lower().endswith('.txt')
makes text file search case insensitive(\d{2})(\d{2})H
pattern matches and captures the first two digits in Group 1 and the next two digits beforeH
into Group 2- When replacing,
\1
refers to Group 1 value and\2
refers to Group 2 value - The file read mode is set to
r
so that the file could be both read and updated. - The
f.seek(0)
andf.truncate()
allow re-writing the file contents with the updated contents.