Home > OS >  Extracting information from a text file using Python
Extracting information from a text file using Python

Time:08-06

I have the below text file with information that looks like this:

# found importantstuffhere
found request could not find identifier. Please check the name and try again.

I also have line that look like this:

# found importantstuffhere
finding (identifier here) with blah blah blah.

I want to write a python code that will go throw the the text file and extract

A. the first example is when the search failed, so I want to extract the 'importantstuffhere' and the phrase 'found request could not find identifier'.

B. when it worked, as shown in second line, I want to extract 'importantstuffhere' and the phrase 'finding (identifier here)'

Is this possible with python and if so how?

Bonus point:

can I have the extracted values be placed in columns in a csv or excel file. such as

column A column B

importantstuffhere - and then for column B it would say either it found request could not find identifier or it would say finding (identifier here).

Thank you for your time!

Note: the # in the text file are part of the text file, I did not write them here just for clarification.

Essentially, extract the values needed, add them to a list so that I can later make them columns in a dataframe. perhaps list one has importantstuffhere and list 2 has the results

CodePudding user response:

script.py:

f = open('sampletext.txt', 'r')
lines = f.readlines()

important_stuff = []

{'line_number': None, 'line_text': ''}

for line_number, text in enumerate(lines):
    if text.find('found request could not find identifier') != -1:
        important_stuff.append({'line_number': line_number, 'line_text': text})

print(important_stuff)

The following will read a file, gather the lines into one string, and write them to a csv separated by commas:

f = open('sampletext.txt', 'r')
lines = f.readlines()

text_seperated_by_comma = ", ".join(lines)
text_without_line_breaks = text_seperated_by_comma.strip('\n')

with open('fileName.csv', 'w') as csv_file:
    f = csv_file.write(text_without_line_breaks)

To check for a string then write the next line to csv file I have this:

f = open('sampletext.txt', 'r')
lines = f.readlines()

csv_lines_to_write = []

SEARCH_TEXT = 'importantstuffhere'

for line_number, text in enumerate(lines):
    if text.find(SEARCH_TEXT) != -1:
        next_line_index = line_number   1
        next_line_text = lines[next_line_index]
        assert type(SEARCH_TEXT) is str
        assert type(next_line_text) is str
        csv_line_to_write = SEARCH_TEXT,   ', '   lines[next_line_index]
        csv_lines_to_write.append(csv_line_to_write)

with open('fileName.csv', 'w') as csv_file:
    for line in csv_lines_to_write:
        csv_file.write(text_without_line_breaks)

I'm getting error

csv_line_to_write = SEARCH_TEXT,   ', '   lines[next_line_index]
TypeError: bad operand type for unary  : 'str'
  • Related