Sum a list of string based on rule-CodePudding

I have a huge python list as the following example:

ls = ['name: John', 'John has ', '4 yellow ', 'cars.', 'name: Angelina', 'Angelina has ', '5 yellow', 'cars.']

I would like to join this information in this formatting:

ls = ['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellow cars.']

I have tried this code

  with open ('names.txt', 'r') as text:  
    lines = text.readlines()
    for index,line in enumerate(lines):
        if not linha.startswith('name:'):
            ls2.append(lines[index] lines[index 1])

But it was not good, since I have something like:

ls = ['name: John', 'John has 4 yellow', '4 yellow cars.', 'cars.name: Angelina']

Do you have any idea how can I perform this task?

CodePudding user response：

You can use itertools.groupby:

import itertools

ls = ['name: John', 'John has ', '4 yellow ', 'cars.', 'name: Angelina', 'Angelina has ', '5 yellow', 'cars.']

g = itertools.groupby(ls, lambda x: x.startswith('name: '))
output = [''.join(v) for _, v in g]
print(output) # ['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellowcars.']

It groups the items by whether each item starts with 'name: ';

Items that start with 'name: ' form a group (i.e., ['name: John']).
Next a few items that don't do so form a group (i.e., ['John has ', '4 yellow ', 'cars.']).
Next items that do so form another group (['name: Angelina']).
... and so on alternatingly.

Then join concatenates the strings in each group.

CodePudding user response：

Concatenate all the lines that don't begin with name: in a variable, then append that to the result when you get to the next name: line.

ls2 = []
temp_string = ''
for line in lines:
    line = line.rstrip('\n')
    if line.startswith('name:'):
        if temp_string:
            ls2.append(temp_string)
            temp_string = ''
            ls2.append(line)
    else:
        temp_string  = line
# append the last set of lines
if temp_string:
    ls2.append(temp_string)

CodePudding user response：

I think the logic can be better expressed as "if the current line begins with name:, then append it to a new list, and also join the next three lines into one line and append that line too."

with open ('names.txt', 'r') as text:  
    lines = text.readlines()

i = 0
ls2 = []
for i, line in enumerate(lines):
    if line.startswith('name:'):
        ls2.append(line)
        ls2.append(lines[i 1]   lines[i 2]   lines[i 3])

CodePudding user response：

Maybe don't split into all lines but just split the whole file by name lines and polish the whitespace afterwards?

import re
with open('names.txt') as f:
    ls = [re.sub(r'\s ', ' ', s.strip())
          for s in re.split('(name:.*)', f.read())
          if s]

Writing your list back to file and using my above code, I get exactly the desired output (with spaces where they should be but no duplicate spaces):

['name: John', 'John has 4 yellow cars.', 'name: Angelina', 'Angelina has 5 yellow cars.']

Try it online!