Converting 2 column dataframe of codes and inconsistent descriptions into a nested list of all possi-CodePudding

Apologies for the poor wording of this posts title, I'm unsure of how best to simplify the explanation for what I'm trying to do.

I have a dataframe output where accounting codes with an inconsistent description column between lines are flagged up. For example:

   Accounting Codes Account Description
10              D_B                   2
10              D_B                 two
11              D_C                   3
11              D_C               three
12              D_D                   4
12              D_D                four
13              D_D                FOUR

I'm trying to use this dataframe to map each unique code to a tkinter label, whilst each of the descriptions matching that code is mapped to a combobox dropdown list. So for example, I have a label marked "D_D" and a combobox in the next column with drop down options of "4", "four" and "FOUR", and the same for the other 2 account codes.

The major issue I'm having is converting the above data frame into a list format like the one below where the account code is the first item, followed by the duplicate descriptions, that can then be easily looped through to generate the tkinter elements:

duplicates = [
        
['D_B', '2', 'two'],
['D_C', '3', 'three'],
['D_D', '4', 'four', 'FOUR']

    ]

I'm really at a complete loss on how to even begin converting this, I've tried taking a look at aggregate and groupby, but can't figure out how to achieve the above output.

CodePudding user response：

Use groupby_apply:

duplicates = df.groupby('Accounting Codes')['Account Description'] \
               .apply(lambda x: [x.name, *x]).tolist()
print(duplicates)

# Output:
[['D_B', '2', 'two'], ['D_C', '3', 'three'], ['D_D', '4', 'four', 'FOUR']]