I'm in the middle of cleaning up some data in Python. I get a load of lists with 6 entries which I eventually want to put into a dataframe, however before doing that I'd like to loop through and check if entry 1 in the list is the only non-empty string in the list. For instance I have a list called transaction_list
:
transaction_list = [
['01Oct', 'CMS ACC TRF /MISC CREDIT, AG', '', '', '5,000.00', '11,377.00'],
['', 'MULTI RESOURCES LIMITED,', '', '', '', ''],
['', 'CMS19274001077, 1094175DAAA107', '', '', '', ''],
['01Oct', 'INTER ACC CREDIT, SY', '', '', '1,000.00', '12,732.07'],
['', 'WTA CO3 (MAUR) LIMITED,', '', '', '', ''],
['', 'CMS19274009397, 729981UAAA298', '', '', '', ''],
['01Oct', 'HOUSE CHEQUE PRESENTED, , 639584,', '639584', '400.00', '', '12,332.07'],
['01Oct', 'CHEQUE PROCESSING FEE, , ,', '', '0.50', '', '12,331.57']
]
I basically need to loop through and store a list in memory, however if the list only has the first entry populated and the other entries are empty strings, I want to merge that entry via a line break with the previous first entry of the loop and then delete that list. So the final should look something like this:
transaction_list = [
['01Oct', 'CMS ACC TRF /MISC CREDIT, AG \n MULTI RESOURCES LIMITED \n CMS19274001077, 1094175DAAA107',
'', '', '5,000.00', '11,377.00'],
['01Oct', 'INTER ACC CREDIT, SY \n WTA CO3 (MAUR) LIMITED \n CMS19274009397, 729981UAAA298',
'', '', '1,000.00', '12,732.07'],
['01Oct', 'HOUSE CHEQUE PRESENTED, , 639584,',
'639584', '400.00', '', '12,332.07'],
['01Oct', 'CHEQUE PROCESSING FEE, , ,', '', '0.50', '', '12,331.57']
]
I've been cracking my head all night on this with no luck.
CodePudding user response:
Try the following code snippet. It checks the required condition, concatenates the string, deletes that inner list. It only increments the index if the list was not deleted, else keeps it the same so looping isn't disrupted.
while (i < len(transaction_list)):
if transaction_list[i][0] == '':
transaction_list[i-1][1] = transaction_list[i-1][1] ' \n ' transaction_list[i][1]
del transaction_list[i]
else:
i = 1
print(transaction_list)
Hope it Helps!
CodePudding user response:
Once you find the right groupby level, the rest can be accomplished with a custom agg function.
The groups can be determined with the cumsum of non null col0
import pandas as pd
import numpy as np
transaction_list = [
['01Oct', 'CMS ACC TRF /MISC CREDIT, AG', '', '', '5,000.00', '11,377.00'],
['', 'MULTI RESOURCES LIMITED,', '', '', '', ''],
['', 'CMS19274001077, 1094175DAAA107', '', '', '', ''],
['01Oct', 'INTER ACC CREDIT, SY', '', '', '1,000.00', '12,732.07'],
['', 'WTA CO3 (MAUR) LIMITED,', '', '', '', ''],
['', 'CMS19274009397, 729981UAAA298', '', '', '', ''],
['01Oct', 'HOUSE CHEQUE PRESENTED, , 639584,', '639584', '400.00', '', '12,332.07'],
['01Oct', 'CHEQUE PROCESSING FEE, , ,', '', '0.50', '', '12,331.57']
]
df = pd.DataFrame(transaction_list)
df = df.replace('',np.nan)
df.groupby((~df[0].isnull()).cumsum()).agg({0:'first',
1: lambda x: '\n'.join(x),
2:'first',
3:'first',
4:'first',
5:'first'}).fillna('').values.tolist()
Output
[['01Oct',
'CMS ACC TRF /MISC CREDIT, AG\nMULTI RESOURCES LIMITED,\nCMS19274001077, 1094175DAAA107',
'',
'',
'5,000.00',
'11,377.00'],
['01Oct',
'INTER ACC CREDIT, SY\nWTA CO3 (MAUR) LIMITED,\nCMS19274009397, 729981UAAA298',
'',
'',
'1,000.00',
'12,732.07'],
['01Oct',
'HOUSE CHEQUE PRESENTED, , 639584,',
'639584',
'400.00',
'',
'12,332.07'],
['01Oct', 'CHEQUE PROCESSING FEE, , ,', '', '0.50', '', '12,331.57']]