Create a Nested list from a pandas data frame-CodePudding

I am trying to create a kind of nested list from a pandas data frame.

I have this data frame:

     id1       Name1     ids1                        Name2      ids2                     ID     col1  Goal     col2    col3       
0   ab-85643      aasd1   234,34,11223,345,345_2        vaasd1    2234,354,223,35,3435     G-0001     1   NaN       3       1      
1   ab-85644      aasd2   2343,355,121,34                                                  G-0002     2   56.0000   4       22     
2   ab-8564312    aabsd1  24 , 23 ,244 ,2421 ,567 ,789                                     G-00023    3   NaN       32      33     
3   ab-8564314    aabsd2  87 ,35 ,67_1                  averabsd   387 ,355 ,667_1         G-01034    4   89.0000   43      44 

#Here is the above data frame and you can convert it again to pandas using the below command
df.to_dict()

dic = {'id1  ': {0: ab-85643, 1: ab-85644, 2: ab-8564312, 3: ab-8564314},
'Name1': {0: 'aasd1 ', 1: 'aasd2 ', 2: 'aabsd1', 3: 'aabsd2'},
 'ids1 ': {0: '234,34,11223,345,345_2      ',
  1: '2343,355,121,34             ',
  2: '24 , 23 ,244 ,2421 ,567 ,789',
  3: '87 ,35 ,67_1                '},
 'Name2': {0: 'vaasd1  ', 1: '        ', 2: '        ', 3: 'averabsd'},
 'ids2': {0: '2234,354,223,35,3435',
  1: '                    ',
  2: '                    ',
  3: ' 387 ,355 ,667_1  '},
 'ID': {0: 'G-0001 ', 1: 'G-0002 ', 2: 'G-00023', 3: 'G-01034'},
 'col1': {0: 1, 1: 2, 2: 3, 3: 4},
 'Goal    ': {0: ' NaN    ', 1: 56, 2: ' NaN    ', 3: 89},
 'col2': {0: 3, 1: 4, 2: 32, 3: 43},
 'col3': {0: 1, 1: 22, 2: 33, 3: 44}}

pd.DataFrame.from_dict(dic)

So I want to create a kind of nested list using the above data frame using 'id1' column, and 'Name1' and 'Name2' columns. For example, if we think about the first row, id1 should be in one list (['ab-85643']) and 'Name1' and 'Name2' should be another list (['aasd1','vaasd1']). Then for the 1st row, id1 list and 'Name1' and 'Name2' list should be in the same list ([['aasd1','vaasd1'],['ab-85643']]). Some rows doesn't have "Name" or "Name2". This should need to be done for all the rows and the final list should be just like the below one.

collection = [[ ['aasd1','vaasd1'],['ab-85643'] ],[ ['aasd2'],['ab-85644'] ],[ ['aabsd1'],['ab-8564312'] ],[ ['aabsd2','averabsd'],['ab-8564314'] ]]

Is it possible to create that using python?

Can someone give me an idea, please?

Anything is appreciated. Thanks in advance!

CodePudding user response：

It's easier if you apply a custom function to the relevant columns:

def get_collections(row):
    first = row[:2].str.strip()
    return [first[first!=''].tolist(), [row[2]]]

out = df[['Name1','Name2','id1']].apply(get_collections, axis=1).tolist()

Output:

[[['aasd1', 'vaasd1'], ['ab-85643']],
 [['aasd2'], ['ab-85644']],
 [['aabsd1'], ['ab-8564312']],
 [['aabsd2', 'averabsd'], ['ab-8564314']]]