I exported many reports from my system in xls in the same specific format and need to change them to another format:
Basically for every item description I need to insert the corresponding Account series it is in column J using pandas.
Data | CP | N0 | N1 | ITEM | DEBIT | CREDIT | NET | D/C |
---|---|---|---|---|---|---|---|---|
Account: (663) | ||||||||
31/10/2022 | 595 | 12 | ITEM DESCRIPTION 4859 | 5.564,40 | 59.786,28 | C | ||
Account: (664) | ||||||||
31/10/2022 | 596 | 12 | ITEM DESCRIPTION 234243 | 3.475,34 | 15.492,41 | D | ||
31/10/2022 | 103 | 14 | ITEM DESCRIPTION 456456 | 0,01 | 15.492,40 | C | ||
Account: (678) | ||||||||
31/10/2022 | 597 | 12 | ITEM DESCRIPTION 2332 | 6.555,27 | 71.503,39 | C | ||
Account: (689) | ||||||||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 66546 | 266.516,00 | 504.013,87 | D | ||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 57567 | 5.578,67 | 7.656.192,54 | D | ||
Account: (500) | ||||||||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 345345 | 54.405,00 | 645.175,00 | D |
I tried to write a script but couldn't fetch a logic to fill the column. Could someone help me?
Desired format:
Data | CP | N0 | N1 | ITEM | DEBIT | CREDIT | NET | D/C | Account |
---|---|---|---|---|---|---|---|---|---|
Account: (663) | |||||||||
31/10/2022 | 595 | 12 | ITEM DESCRIPTION 4859 | 5.564,40 | 59.786,28 | C | Account: (663) | ||
Account: (664) | |||||||||
31/10/2022 | 596 | 12 | ITEM DESCRIPTION 234243 | 3.475,34 | 15.492,41 | D | Account: (664) | ||
31/10/2022 | 103 | 14 | ITEM DESCRIPTION 456456 | 0,01 | 15.492,40 | C | Account: (664) | ||
Account: (678) | |||||||||
31/10/2022 | 597 | 12 | ITEM DESCRIPTION 2332 | 6.555,27 | 71.503,39 | C | Account: (678) | ||
Account: (689) | |||||||||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 66546 | 266.516,00 | 504.013,87 | D | Account: (689) | ||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 57567 | 5.578,67 | 7.656.192,54 | D | Account: (689) | ||
Account: (500) | |||||||||
31/10/2022 | 608 | 13 | ITEM DESCRIPTION 345345 | 54.405,00 | 645.175,00 | D | Account: (500) |
CodePudding user response:
try this:
mask = df['Data'].str.startswith('Account')
df['Account'] = df.groupby(mask.cumsum())['Data'].transform('first').mask(mask)
print(df)
# df data like this:
data = [{'Data': 'Account: (663)',
'CP': 'nan',
'N0': 'nan',
'N1': 'nan',
'ITEM': 'nan',
'DEBIT': 'nan',
'CREDIT': 'nan',
'NET': 'nan',
'D/C': 'nan',
'Account': 'nan'},
{'Data': '31/10/2022',
'CP': 595.0,
'N0': 'nan',
'N1': 12.0,
'ITEM': 'ITEM DESCRIPTION 4859',
'DEBIT': '5.564,40',
'CREDIT': 'nan',
'NET': '59.786,28',
'D/C': 'C',
'Account': 'Account: (663)'},
{'Data': 'Account: (664)',
'CP': 'nan',
'N0': 'nan',
'N1': 'nan',
'ITEM': 'nan',
'DEBIT': 'nan',
'CREDIT': 'nan',
'NET': 'nan',
'D/C': 'nan',
'Account': 'nan'},
{'Data': '31/10/2022',
'CP': 596.0,
'N0': 'nan',
'N1': 12.0,
'ITEM': 'ITEM DESCRIPTION 234243',
'DEBIT': '3.475,34',
'CREDIT': 'nan',
'NET': '15.492,41',
'D/C': 'D',
'Account': 'Account: (664)'},
{'Data': '31/10/2022',
'CP': 103.0,
'N0': 'nan',
'N1': 14.0,
'ITEM': 'ITEM DESCRIPTION 456456',
'DEBIT': 'nan',
'CREDIT': '0,01',
'NET': '15.492,40',
'D/C': 'C',
'Account': 'Account: (664)'},
{'Data': 'Account: (678)',
'CP': 'nan',
'N0': 'nan',
'N1': 'nan',
'ITEM': 'nan',
'DEBIT': 'nan',
'CREDIT': 'nan',
'NET': 'nan',
'D/C': 'nan',
'Account': 'nan'},
{'Data': '31/10/2022',
'CP': 597.0,
'N0': 'nan',
'N1': 12.0,
'ITEM': 'ITEM DESCRIPTION 2332',
'DEBIT': '6.555,27',
'CREDIT': 'nan',
'NET': '71.503,39',
'D/C': 'C',
'Account': 'Account: (678)'},
{'Data': 'Account: (689)',
'CP': 'nan',
'N0': 'nan',
'N1': 'nan',
'ITEM': 'nan',
'DEBIT': 'nan',
'CREDIT': 'nan',
'NET': 'nan',
'D/C': 'nan',
'Account': 'nan'},
{'Data': '31/10/2022',
'CP': 608.0,
'N0': 'nan',
'N1': 13.0,
'ITEM': 'ITEM DESCRIPTION 66546',
'DEBIT': '266.516,00',
'CREDIT': 'nan',
'NET': '504.013,87',
'D/C': 'D',
'Account': 'Account: (689)'},
{'Data': '31/10/2022',
'CP': 608.0,
'N0': 'nan',
'N1': 13.0,
'ITEM': 'ITEM DESCRIPTION 57567',
'DEBIT': '5.578,67',
'CREDIT': 'nan',
'NET': '7.656.192,54',
'D/C': 'D',
'Account': 'Account: (689)'},
{'Data': 'Account: (500)',
'CP': 'nan',
'N0': 'nan',
'N1': 'nan',
'ITEM': 'nan',
'DEBIT': 'nan',
'CREDIT': 'nan',
'NET': 'nan',
'D/C': 'nan',
'Account': 'nan'},
{'Data': '31/10/2022',
'CP': 608.0,
'N0': 'nan',
'N1': 13.0,
'ITEM': 'ITEM DESCRIPTION 345345',
'DEBIT': '54.405,00',
'CREDIT': 'nan',
'NET': '645.175,00',
'D/C': 'D',
'Account': 'Account: (500)'}]
df = pd.DataFrame(data).replace('nan', float('nan'))
CodePudding user response:
This is not the most "pythonic" way of doing this, but because finding the most efficient way is probably not so important here, this should work fine:
data_col = df['Data']
def find_last_acct_entry(date):
idx = data_col.index(date)
ret = idx
while not data_col[ret].startswith('Account') and ret >= 1:
ret -= 1
return data_col[ret]
for idx, row in df.iterrows():
if not row['Data'].startswith('Account'):
acct = find_last_acct_entry(row['Data'])
df.loc[idx, 'Account'] = acct