Home > Net >  How to sub first character in a text file with regex in Python
How to sub first character in a text file with regex in Python

Time:06-14

I have some lower case texts and I'm trying to amend all of the lower case 'i' to uppercase 'I' (including " i'" to " I'"). I have a text file with this text to test my code ('i_test.txt)

    i am a bumble 
    bee i'm not the 
    prodigal son
    i'll stop talking 
    now it's all done i think

This script amends all cases except for the first character in the text file (adapted from How can I do multiple substitutions using regex?):

    import re, os

    file = 'i_test.txt'

    def multiple_replace(dict, text):
      # Create a regular expression  from the dictionary keys
      regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

      # For each match, look-up corresponding value in dictionary
      return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

    if __name__ == "__main__":

        dict = {
        "^i " : "I ",
        " i " : " I ",
        "\ni " : "\nI ",
        " i'" : " I'",
        "\ni'" : "\nI'",
        "^i'" : "I'",
        }

    with open(file) as text:
        new_text = multiple_replace(dict, text.read())
    with open("i_out.txt", "w") as result:
        result.write(new_text)

The output is:

    i am a bumble 
    bee I'm not the 
    prodigal son
    I'll stop talking 
    now it's all done I think

In the dictionary I am searching for patterns of 'i' preceded by and followed by a space, preceded by a new line and followed by a space ( similar patterns for i'). I attempted to amend the first character with this regex

    "^i " : "I ",

But it doesn't work, is there a way to sub the first character in a text file?

CodePudding user response:

You may not need a map covering all possible occurrences of the first person singular pronoun. I believe a regex replacement on \bi\b should give the result you want:

inp = """i am a bumble 
bee i'm not the 
prodigal son
i'll stop talking 
now it's all done i think"""
output = re.sub(r'\bi\b', 'I', inp)
print(output)

This prints:

I am a bumble 
bee I'm not the 
prodigal son
I'll stop talking 
now it's all done I think
  • Related