Home > Mobile >  How to split the data based on some key elements in the list using python?
How to split the data based on some key elements in the list using python?

Time:10-19

Following is my returned list

['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']

from which I want to join the elements starts after the element 'ccc' if the element 'ddd' is absent or starts with 'ddd' till the next element 'aaa' it will get by which I can get the following strings.

ABN AMRO Bank N.V.
Your monthly statement is
available under Self service >
Download statements or u
receive them by mail.

/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF

Settlement FX/MM
Trans. Ref. 0035979579
Deal Ticket ID 6225447

Can anyone help me please? I got messed in the nested for loops while attempting this. Thanks!

CodePudding user response:

You can replace aaa, bbb, ccc and ddd with newlines, then split on multiple newlines:

import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
data = [' ' if i in ['aaa', 'bbb', 'ccc', 'ddd'] else i for i in data]
data = ' '.join([i for i in data]).strip()
data = re.split('\s\s\s ', data)

This will get you a list of the desired groups

print('\n\n'.join(data)):

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.

/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF

Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447

CodePudding user response:

You could try this:

L = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
     '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
i = 0
S = None
while True:
    try:
        _L = L[i:]
        o = _L.index('ccc')   1
        if _L[o] == 'ddd':
            o  = 1
        S = []
        while _L[o] != 'aaa':
            S.append(_L[o])
            o  = 1
        print(' '.join(S))
        S = None
        i  = o
    except (IndexError, ValueError):
        if S:
            print(' '.join(S))
        break

CodePudding user response:

You could try and use regex as follows:


import re

data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 
  '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 
  'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u',
  'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
  '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 
  '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 
  'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 
  'Ticket', 'ID', '6225447']

#flatten the list
one_line = ' '.join(data)

#substitue groups 'aaa bbb ccc' and 'aaa bbb ccc ddd' with newline chars
print(re.sub(r'(aaa bbb ccc) | (aaa bbb ccc ddd)', '\n\n', one_line).lstrip())

output:

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.

 /TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF 

Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447
  • Related