Following is my returned list
['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
from which I want to join the elements starts after the element 'ccc'
if the element 'ddd
' is absent or starts with 'ddd'
till the next element 'aaa'
it will get by which I can get the following strings.
ABN AMRO Bank N.V.
Your monthly statement is
available under Self service >
Download statements or u
receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF
Settlement FX/MM
Trans. Ref. 0035979579
Deal Ticket ID 6225447
Can anyone help me please? I got messed in the nested for loops while attempting this. Thanks!
CodePudding user response:
You can replace aaa, bbb, ccc
and ddd
with newlines, then split on multiple newlines:
import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
data = [' ' if i in ['aaa', 'bbb', 'ccc', 'ddd'] else i for i in data]
data = ' '.join([i for i in data]).strip()
data = re.split('\s\s\s ', data)
This will get you a list of the desired groups
print('\n\n'.join(data))
:
ABN AMRO Bank N.V.
Your monthly statement is available under Self service >
Download statements or u receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF
Settlement FX/MM
Trans. Ref. 0035979579 Deal Ticket ID 6225447
CodePudding user response:
You could try this:
L = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', '\nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
'/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', '\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
i = 0
S = None
while True:
try:
_L = L[i:]
o = _L.index('ccc') 1
if _L[o] == 'ddd':
o = 1
S = []
while _L[o] != 'aaa':
S.append(_L[o])
o = 1
print(' '.join(S))
S = None
i = o
except (IndexError, ValueError):
if S:
print(' '.join(S))
break
CodePudding user response:
You could try and use regex as follows:
import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.',
'\nYour', 'monthly', 'statement', 'is', 'available', 'under',
'Self', 'service', '>', '\nDownload', 'statements', 'or', 'u',
'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
'/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/',
'\nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb',
'ccc', 'Settlement', 'FX/MM', '\nTrans.', 'Ref.', '0035979579', 'Deal',
'Ticket', 'ID', '6225447']
#flatten the list
one_line = ' '.join(data)
#substitue groups 'aaa bbb ccc' and 'aaa bbb ccc ddd' with newline chars
print(re.sub(r'(aaa bbb ccc) | (aaa bbb ccc ddd)', '\n\n', one_line).lstrip())
output:
ABN AMRO Bank N.V.
Your monthly statement is available under Self service >
Download statements or u receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF
Settlement FX/MM
Trans. Ref. 0035979579 Deal Ticket ID 6225447