How to split the data based on some key elements in the list using python?-CodePudding

Following is my returned list

['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']

from which I want to join the elements starts after the element 'ccc' if the element 'ddd' is absent or starts with 'ddd' till the next element 'aaa' it will get by which I can get the following strings.

ABN AMRO Bank N.V.
Your monthly statement is
available under Self service >
Download statements or u
receive them by mail.

/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF

Settlement FX/MM
Trans. Ref. 0035979579
Deal Ticket ID 6225447

Can anyone help me please? I got messed in the nested for loops while attempting this. Thanks!

CodePudding user response：

You can replace aaa, bbb, ccc and ddd with newlines, then split on multiple newlines:

import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
data = [' ' if i in ['aaa', 'bbb', 'ccc', 'ddd'] else i for i in data]
data = ' '.join([i for i in data]).strip()
data = re.split('\s\s\s ', data)

This will get you a list of the desired groups

print('\n\n'.join(data)):

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.

/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF

Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447

CodePudding user response：

You could try this:

L = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
     '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
i = 0
S = None
while True:
    try:
        _L = L[i:]
        o = _L.index('ccc')   1
        if _L[o] == 'ddd':
            o  = 1
        S = []
        while _L[o] != 'aaa':
            S.append(_L[o])
            o  = 1
        print(' '.join(S))
        S = None
        i  = o
    except (IndexError, ValueError):
        if S:
            print(' '.join(S))
        break

CodePudding user response：

You could try and use regex as follows:


import re

data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 
  '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 
  'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u',
  'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
  '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 
  '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 
  'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 
  'Ticket', 'ID', '6225447']

#flatten the list
one_line = ' '.join(data)

#substitue groups 'aaa bbb ccc' and 'aaa bbb ccc ddd' with newline chars
print(re.sub(r'(aaa bbb ccc) | (aaa bbb ccc ddd)', '\n\n', one_line).lstrip())

output:

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.

 /TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF 

Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447