Home > Mobile >  To find some words in a text file using regex and later print them in a different text file
To find some words in a text file using regex and later print them in a different text file

Time:03-23

I need to find some words such as inherited, INHERITANCE, Ingeritable, etc., using regex, in a text file (origin.txt) and later I want to print them in a new text file (origin_spp.txt) and the line where they were found.

This is my code

re_pattern_string = r'(?:inherit|INHERIT|Inherit)*\w'

print('Opening origin.txt')
with open('origin.txt', 'r') as in_stream:
    print('Opening origin_spp.txt')
    with open('origin_spp.txt', 'w') as out_stream:
        for num, line in enumerate (in_stream):
        re_pattern_object = re.compile(re_pattern_string)
        line = line.strip()
        inherit_list = line.split()
        temp_list = re_pattern_object.findall('line')
        complete = origin_list.append('temp_list')
        for word in temp_list:
            out_stream.write(str(num)   '\t{0}\n'.format(word))

print("Done!")
print('origin.txt is closed?', in_stream.closed)
print('origin_spp.txt is closed?', out_stream.closed)

if __name__ == '__main__':
    print(temp_list)

Can you help me, please? I am not getting anything and I do not know where is the error.

Thank you in advance

I need to print the words that I want to find in the origin.txt in a different text file.

This new file must contain the number of the line in the origin.txt plus the word/s.

CodePudding user response:

Your code had some problems:

  • It's redundant to define re.compile inside for.
  • for re_pattern_object.findall('line') and origin_list.append('temp_list') don't wrap variables with ''
  • with findall we don't need iterate lines, it's works for whole text.

Because you didn't provide input and output I just guess what you want:

import re

re_pattern_string = r'((?:inherit|INHERIT|Inherit)(\w*))'
originmain_list = []
re_pattern_object = re.compile(re_pattern_string)
print('Opening origin.txt')
with open('origin.txt', 'r') as in_stream:
    print('Opening origin_spp.txt')
    with open('origin_spp.txt', 'w') as out_stream:
        for num, line in enumerate(in_stream):
            temp_list = re_pattern_object.findall(line)
            for word in temp_list:
                out_stream.write(str(num)   '\t{0}\n'.format(word[0]))
                originmain_list.append((num, word[0]))

print("Done!")
print('origin.txt is closed?', in_stream.closed)
print('origin_spp.txt is closed?', out_stream.closed)
print(originmain_list)

if origin.txt contains:

inheritxxxxxxx some text INHERITccccc some text
Inheritzzzzzzzz some text
inherit some text INHERIT some text
Inherit some text

the output in the origin_spp.txt will be

0   inheritxxxxxxx
0   INHERITccccc
1   Inheritzzzzzzzz
2   inherit
2   INHERIT
3   Inherit

The command line output will be:

Opening origin.txt
Opening origin_spp.txt
Done!
origin.txt is closed? True
origin_spp.txt is closed? True
[(0, 'inheritxxxxxxx'), (0, 'INHERITccccc'), (1, 'Inheritzzzzzzzz'), (2, 'inherit'), (2, 'INHERIT'), (3, 'Inherit')]
  • Related