I need to read a text file file.txt
that looks like this:
NaN
NaN
[ From To Type When Price
0 SillyZir 0x4a34 Bid June 18th, 2022 50000]
NaN
[ From To Type When Price
0 SillyZir Klima#3171 Bid June 16th, 2022 60000]
I have tried this code
with open("file.txt") as f:
lines = [line.rstrip() for line in f]
but my Output looks like this (not right)
['NaN',
'NaN',
'[ From To Type When Price',
'0 SillyZir 0x4a34 Bid June 18th, 2022 50000]',
'NaN',
'[ From To Type When Price',
'0 SillyZir Klima#3171 Bid June 16th, 2022 60000]']
I would like to access the list in the list, but the Code separates the lines after "Price" and I don't know how to work around that...
I have done some research but I couldn't find anything that works. I'm kinda new to python so I would really appreciate some help!
Thank you!
CodePudding user response:
You can use basic logic with branching if-statements, and a regex to split your lists on spaces or tabs:
import re
inside = False # to signify if we are inside a list
result = []
sublist = []
for line in lines:
if not inside:
if line[0] == '[':
inside = True
words = re.split('[ \t] ', line)
sublist.extend(words[1:])
else:
result.append(line)
else:
words = re.split('[ \t] ', line)
if line[-1] == ']':
inside = False
sublist.extend(words[:-1])
result.append(sublist)
sublist = []
else:
sublist.extend(words[:-1])
Result:
[
'NaN',
'NaN',
['From', 'To', 'Type', 'When', 'Price', '0', 'SillyZir', '0x4a34', 'Bid', 'June', '18th,', '2022'],
'NaN',
['From', 'To', 'Type', 'When', 'Price', '0', 'SillyZir', 'Klima#3171', 'Bid', 'June', '16th,', '2022']
]
CodePudding user response:
I have processed the list and extracted the words, see if the result is what you need
with open('file.txt') as f:
lis = f.read().split('\n')
result = []
for e in lis:
if e.startswith('['):
result.append(e.strip('[').split())
elif e[0].isdigit():
result.append(e[3:].strip(']').strip().split(' '))
elif e == 'NaN':
result.append(e)
print(result)
Output:
['NaN',
'NaN',
['From', 'To', 'Type', 'When', 'Price'],
['SillyZir', '0x4a34', 'Bid', 'June 18th, 2022', '50000'],
'NaN',
['From', 'To', 'Type', 'When', 'Price'],
['SillyZir', 'Klima#3171', 'Bid', 'June 16th, 2022', '60000']]