I have some lower case texts and I'm trying to amend all of the lower case 'i' to uppercase 'I' (including " i'" to " I'"). I have a text file with this text to test my code ('i_test.txt)
i am a bumble
bee i'm not the
prodigal son
i'll stop talking
now it's all done i think
This script amends all cases except for the first character in the text file (adapted from How can I do multiple substitutions using regex?):
import re, os
file = 'i_test.txt'
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
if __name__ == "__main__":
dict = {
"^i " : "I ",
" i " : " I ",
"\ni " : "\nI ",
" i'" : " I'",
"\ni'" : "\nI'",
"^i'" : "I'",
}
with open(file) as text:
new_text = multiple_replace(dict, text.read())
with open("i_out.txt", "w") as result:
result.write(new_text)
The output is:
i am a bumble
bee I'm not the
prodigal son
I'll stop talking
now it's all done I think
In the dictionary I am searching for patterns of 'i' preceded by and followed by a space, preceded by a new line and followed by a space ( similar patterns for i'). I attempted to amend the first character with this regex
"^i " : "I ",
But it doesn't work, is there a way to sub the first character in a text file?
CodePudding user response:
You may not need a map covering all possible occurrences of the first person singular pronoun. I believe a regex replacement on \bi\b
should give the result you want:
inp = """i am a bumble
bee i'm not the
prodigal son
i'll stop talking
now it's all done i think"""
output = re.sub(r'\bi\b', 'I', inp)
print(output)
This prints:
I am a bumble
bee I'm not the
prodigal son
I'll stop talking
now it's all done I think