So I have this txt file:
Haiku
5 *
7 *
5 *
Limerick
8 A
8 A
5 B
5 B
8 A
And I want to write a function that returns something like this:
[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8' ,'A']]
Ive tried this:
small_pf = open('datasets/poetry_forms_small.txt')
lst = []
for line in small_pf:
lst.append(line.strip())
small_pf.close()
print(lst)
At the end I end up with this:
['Haiku', '5 *', '7 *', '5 *', '', 'Limerick', '8 A', '8 A', '5 B', '5 B', '8 A']
My problem is that this is one big list, and the elements of the list are attached together, like '5 *' or '8 A'. I honestly don't know where to start and thats why I need some guidance into what to do for those two problems. Any help would be greatly appreciated.
CodePudding user response:
When you see an empty line : don't add it, save the tmp list you've been filling, and continue
lst = []
with open('test.txt') as small_pf:
tmp_list = []
for line in small_pf:
line = line.rstrip("\n")
if line == "":
lst.append(tmp_list)
tmp_list = []
else:
tmp_list.extend(line.split())
if tmp_list: # add last one
lst.append(tmp_list)
print(lst)
# [['Haiku', '5', '*', '7', '*', '5', '*'],
# ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
CodePudding user response:
First split the file into sections on blank lines (\n\n
), then split each section on any whitespace (newlines or spaces).
lst = [section.split() for section in small_pf.read().split('\n\n')]
Result:
[['Haiku', '5', '*', '7', '*', '5', '*'],
['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
CodePudding user response:
Solution without using extra modules
small_pf = small_pf.readlines()
result = []
tempList = []
for index,line in enumerate(small_pf):
if line == "\n" or index == len(small_pf) -1:
result.append(tempList.copy())
del tempList[:]
else:
for value in line.strip("\n").split():
tempList.append(value)
result
Solution with module
You can use regex to solve your problem:
import re
small_pf = small_pf.read()
[re.split("\s|\n", x) for x in re.split("\n\n", small_pf)]
Output
[['Haiku', '5', '*', '7', '*', '5', '*'],
['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
CodePudding user response:
This approach assumes that a line either starts with a character that is a decimal value or a nondecimal value. Moreover, it assumes that if it starts with a nondecimal value that this should start a new list with the line (as a string, without any trailing whitespace) as the first element. If subsequent lines start with a decimal value, these are stripped of trailing whitespace, and parts of the line (determined by separation from a space) are added as elements in the most recently created list.
lst = []
with open("blankpaper.txt") as f:
for line in f:
# ignore empty lines
if line.rstrip() == '':
continue
if not line[0].isdecimal():
new_list = [line.rstrip()]
lst.append(new_list)
continue
new_list.extend(line.rstrip().split(" "))
print(lst)
Output
[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
I hope this helps. If there are any questions, please let me know.