Home > other >  How to transform Row data into column data using pandas?
How to transform Row data into column data using pandas?

Time:11-30

I exported many reports from my system in xls in the same specific format and need to change them to another format:

Basically for every item description I need to insert the corresponding Account series it is in column J using pandas.

Data CP N0 N1 ITEM DEBIT CREDIT NET D/C
Account: (663)
31/10/2022 595 12 ITEM DESCRIPTION 4859 5.564,40 59.786,28 C
Account: (664)
31/10/2022 596 12 ITEM DESCRIPTION 234243 3.475,34 15.492,41 D
31/10/2022 103 14 ITEM DESCRIPTION 456456 0,01 15.492,40 C
Account: (678)
31/10/2022 597 12 ITEM DESCRIPTION 2332 6.555,27 71.503,39 C
Account: (689)
31/10/2022 608 13 ITEM DESCRIPTION 66546 266.516,00 504.013,87 D
31/10/2022 608 13 ITEM DESCRIPTION 57567 5.578,67 7.656.192,54 D
Account: (500)
31/10/2022 608 13 ITEM DESCRIPTION 345345 54.405,00 645.175,00 D

I tried to write a script but couldn't fetch a logic to fill the column. Could someone help me?

Desired format:

Data CP N0 N1 ITEM DEBIT CREDIT NET D/C Account
Account: (663)
31/10/2022 595 12 ITEM DESCRIPTION 4859 5.564,40 59.786,28 C Account: (663)
Account: (664)
31/10/2022 596 12 ITEM DESCRIPTION 234243 3.475,34 15.492,41 D Account: (664)
31/10/2022 103 14 ITEM DESCRIPTION 456456 0,01 15.492,40 C Account: (664)
Account: (678)
31/10/2022 597 12 ITEM DESCRIPTION 2332 6.555,27 71.503,39 C Account: (678)
Account: (689)
31/10/2022 608 13 ITEM DESCRIPTION 66546 266.516,00 504.013,87 D Account: (689)
31/10/2022 608 13 ITEM DESCRIPTION 57567 5.578,67 7.656.192,54 D Account: (689)
Account: (500)
31/10/2022 608 13 ITEM DESCRIPTION 345345 54.405,00 645.175,00 D Account: (500)

CodePudding user response:

try this:

mask = df['Data'].str.startswith('Account')
df['Account'] = df.groupby(mask.cumsum())['Data'].transform('first').mask(mask)
print(df)

# df data like this:

data = [{'Data': 'Account: (663)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 595.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 4859',
  'DEBIT': '5.564,40',
  'CREDIT': 'nan',
  'NET': '59.786,28',
  'D/C': 'C',
  'Account': 'Account: (663)'},
 {'Data': 'Account: (664)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 596.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 234243',
  'DEBIT': '3.475,34',
  'CREDIT': 'nan',
  'NET': '15.492,41',
  'D/C': 'D',
  'Account': 'Account: (664)'},
 {'Data': '31/10/2022',
  'CP': 103.0,
  'N0': 'nan',
  'N1': 14.0,
  'ITEM': 'ITEM DESCRIPTION 456456',
  'DEBIT': 'nan',
  'CREDIT': '0,01',
  'NET': '15.492,40',
  'D/C': 'C',
  'Account': 'Account: (664)'},
 {'Data': 'Account: (678)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 597.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 2332',
  'DEBIT': '6.555,27',
  'CREDIT': 'nan',
  'NET': '71.503,39',
  'D/C': 'C',
  'Account': 'Account: (678)'},
 {'Data': 'Account: (689)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 66546',
  'DEBIT': '266.516,00',
  'CREDIT': 'nan',
  'NET': '504.013,87',
  'D/C': 'D',
  'Account': 'Account: (689)'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 57567',
  'DEBIT': '5.578,67',
  'CREDIT': 'nan',
  'NET': '7.656.192,54',
  'D/C': 'D',
  'Account': 'Account: (689)'},
 {'Data': 'Account: (500)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 345345',
  'DEBIT': '54.405,00',
  'CREDIT': 'nan',
  'NET': '645.175,00',
  'D/C': 'D',
  'Account': 'Account: (500)'}]

df = pd.DataFrame(data).replace('nan', float('nan'))

CodePudding user response:

This is not the most "pythonic" way of doing this, but because finding the most efficient way is probably not so important here, this should work fine:

data_col = df['Data']

def find_last_acct_entry(date):
    idx = data_col.index(date)
    ret = idx
    while not data_col[ret].startswith('Account') and ret >= 1:
        ret -= 1
    return data_col[ret]

for idx, row in df.iterrows():
    if not row['Data'].startswith('Account'):
        acct = find_last_acct_entry(row['Data'])
        df.loc[idx, 'Account'] = acct
  • Related