I first took this data in a text file below:
text in data.txt:
text: 8473
second: second text
3rd: 23-54-65-87
txt: 583
sec: sec text
3) 5343-436654-98989
And I changed it into a list like this:
['text:', '8473', 'second:', 'second', 'text', '3rd:', '23-54-65-87', 'txt:', '583', 'sec:', 'sec', 'text', '3)', '5343-436654-98989']
I then removed everything with a colon. And my next step in the program below is to manually merge 'second' and 'text' and then 'sec' and 'text'. This wont work in a program with an undetermined number of data in the above format. So I want to do this as a loop that can produce the following result(note that now 'second text' and 'sec text' are one item:
['8473', 'second text', '23-54-65-87', '583', 'sec text', '5343-436654-98989']
But I can only find ways to merge every pair of items, but I can't find a way to merge an item with the next item every 3 items like I want to....
Here is the program so far:
file = 'data.txt'
corrected = []
one = []
two = []
three = []
full = [one, two, three]
with open(file, 'r') as f:
contents = f.read()
list = contents.split()
print(list)
for item in list:
if ":" not in item:
if ")" not in item:
corrected.append(item)
**corrected[1] = corrected[1] ' ' corrected[2]
del corrected[2]
corrected[4] = corrected[4] ' ' corrected[5]
del corrected[5]
print(f"{corrected}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n")**
for item in corrected[::3]:
one.append(item)
for item in corrected[1::3]:
two.append(item)
for item in corrected[2::3]:
three.append(item)
index = 1
for item in full:
print(f"{index}:{item}")
index = 1
Current and desired Resulting output:
1:['8473', '583']
2:['second text', 'sec text']
3:['23-54-65-87', '5343-436654-98989']
CodePudding user response:
There are several ways to do this. It seems like your input is structured in blocks of 3 lines, separated by an empty line. You could first split the input into such blocks, then each block in separate lines, then from each line extract the part after the first space.
Here is how that looks:
result = None
for block in contents.split("\n\n"):
lines = block.splitlines()
if not result:
result = [[] for _ in lines]
for i, line in enumerate(lines):
result[i].append(line.split(" ", 1).pop())
print(result)
This outputs:
[['8473', '583'], ['second text', 'sec text'], ['23-54-65-87', '5343-436654-98989']]
The code will break if there is a block that has a greater number of lines than the first block. It assumes the input is well structured.
CodePudding user response:
Sounds to me like you just want to discard the first word of every row:
text='''text: 8473
second: second text
3rd: 23-54-65-87
txt: 583
sec: sec text
3) 5343-436654-98989'''
result = [row.split(maxsplit=1)[1] for row in text.split('\n') if row]
print(result)
# ['8473', 'second text', '23-54-65-87', '583', 'sec text', '5343-436654-98989']
If the text is read from a file, then the split on '\n'
is implicit:
with open('data.txt', 'r') as f:
result = [row.split(maxsplit=1)[1] for row in f if row]
CodePudding user response:
Solution
This is a job for regex:
import re
with open("data.txt", "r") as f:
data = f.read()
p = re.compile(r"(?:text|txt|second|sec|3rd|3\)):? (.*)")
corrected = p.findall(data)
Demo
In [1]: import re
In [2]: s = """text: 8473
...: second: second text
...: 3rd: 23-54-65-87
...:
...: txt: 583
...: sec: sec text
...: 3) 5343-436654-98989"""
In [3]: p = re.compile(r"(?:text|txt|second|sec|3rd|3\)):? (.*)")
In [4]: p.findall(s)
Out[4]: ['8473', 'second text', '23-54-65-87', '583', 'sec text', '5343-436654-98989']