I have a huge python list as the following example:
ls = ['name: John', 'John has ', '4 yellow ', 'cars.', 'name: Angelina', 'Angelina has ', '5 yellow', 'cars.']
I would like to join this information in this formatting:
ls = ['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellow cars.']
I have tried this code
with open ('names.txt', 'r') as text:
lines = text.readlines()
for index,line in enumerate(lines):
if not linha.startswith('name:'):
ls2.append(lines[index] lines[index 1])
But it was not good, since I have something like:
ls = ['name: John', 'John has 4 yellow', '4 yellow cars.', 'cars.name: Angelina']
Do you have any idea how can I perform this task?
CodePudding user response:
You can use itertools.groupby
:
import itertools
ls = ['name: John', 'John has ', '4 yellow ', 'cars.', 'name: Angelina', 'Angelina has ', '5 yellow', 'cars.']
g = itertools.groupby(ls, lambda x: x.startswith('name: '))
output = [''.join(v) for _, v in g]
print(output) # ['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellowcars.']
It groups the items by whether each item starts with 'name: '
;
- Items that start with
'name: '
form a group (i.e.,['name: John']
). - Next a few items that don't do so form a group (i.e.,
['John has ', '4 yellow ', 'cars.']
). - Next items that do so form another group (
['name: Angelina']
). - ... and so on alternatingly.
Then join
concatenates the strings in each group.
CodePudding user response:
Concatenate all the lines that don't begin with name:
in a variable, then append that to the result when you get to the next name:
line.
ls2 = []
temp_string = ''
for line in lines:
line = line.rstrip('\n')
if line.startswith('name:'):
if temp_string:
ls2.append(temp_string)
temp_string = ''
ls2.append(line)
else:
temp_string = line
# append the last set of lines
if temp_string:
ls2.append(temp_string)
CodePudding user response:
I think the logic can be better expressed as "if the current line begins with name:
, then append it to a new list, and also join the next three lines into one line and append that line too."
with open ('names.txt', 'r') as text:
lines = text.readlines()
i = 0
ls2 = []
for i, line in enumerate(lines):
if line.startswith('name:'):
ls2.append(line)
ls2.append(lines[i 1] lines[i 2] lines[i 3])
CodePudding user response:
Maybe don't split into all lines but just split the whole file by name lines and polish the whitespace afterwards?
import re
with open('names.txt') as f:
ls = [re.sub(r'\s ', ' ', s.strip())
for s in re.split('(name:.*)', f.read())
if s]
Writing your list back to file and using my above code, I get exactly the desired output (with spaces where they should be but no duplicate spaces):
['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellow cars.']