How to make a nested Dictionary from a Pandas data frame suing several columns?-CodePudding

I am trying to create a nested dictionary from a pandas data frame.

I have this data frame:

     id1            ids1                         Name1        Name2      ids2                     ID     col1  Goal     col2    col3       
0   85643        234,34,11223,345,345_2         aasd1        vaasd1    2234,354,223,35,3435     G-0001     1   NaN       3       1      
1   85644        2343,355,121,34                aasd2                                           G-0002     2   56.0000   4       22     
2   8564312      24 , 23 ,244 ,2421 ,567 ,789   aabsd1                                          G-00023    3   NaN       32      33     
3   8564314      87 ,35 ,67_1                   aabsd2       averabsd   387 ,355 ,667_1         G-01034    4   89.0000   43      44 

df.to_dict()
#Here is wht you requested
{'id1  ': {0: 85643, 1: 85644, 2: 8564312, 3: 8564314},
 'ids1 ': {0: '234,34,11223,345,345_2      ',
  1: '2343,355,121,34             ',
  2: '24 , 23 ,244 ,2421 ,567 ,789',
  3: '87 ,35 ,67_1                '},
 'Name1': {0: 'aasd1 ', 1: 'aasd2 ', 2: 'aabsd1', 3: 'aabsd2'},
 'Name2': {0: 'vaasd1  ', 1: '        ', 2: '        ', 3: 'averabsd'},
 'ids2': {0: '2234,354,223,35,3435',
  1: '                    ',
  2: '                    ',
  3: ' 387 ,355 ,667_1  '},
 'ID': {0: 'G-0001 ', 1: 'G-0002 ', 2: 'G-00023', 3: 'G-01034'},
 'col1': {0: 1, 1: 2, 2: 3, 3: 4},
 'Goal    ': {0: ' NaN    ', 1: 56, 2: ' NaN    ', 3: 89},
 'col2': {0: 3, 1: 4, 2: 32, 3: 43},
 'col3': {0: 1, 1: 22, 2: 33, 3: 44}}

Each row in the "ID" column needs to be the key. inside that dictionary, the 'Name1' column and the 'Name2' columns need to be there as a list. 'Name1' column list is associated with the "ids1" column and the 'Name2' column list is associated with the "ids2" column. I also need to put the "ID" column name inside that list too.

So I want to create a nested dictionary-like below.

mapper={
"G-0001":{"aasd1":['G-0001','234','34','11223','345','345_2'],
"vaasd1":['G-0001','2234','354','223','35','3435']},
"G-0002":{"aasd2":['G-0002','2343','355','121','34']},
"G-00023":{"aabsd1":['G-00023','24' , '23' ,'244' ,'2421' ,'567' ,'789']},
"G-01034":{"aabsd2":['G-01034','87' ,'35' ,'67_1'],
"averabsd":['G-01034','387' ,'355' ,'667_1']}
}

Is it possible to create that? Can someone give me an idea, please? Anything is appreciated. Thanks in advance!

CodePudding user response：

Try:

Convert DataFrame from wide to long format
Drop rows without "Name" and append "ID" to "ids"
groupby and construct the required output dictionary.

#remove extra spaces from column names
df.columns = df.columns.str.strip()

#assign and index and convert DataFrame from wide to long format
df["idx"] = df.index
wtl = pd.wide_to_long(df, ["Name","ids"], "idx","j")

#drop rows without Name
wtl = wtl[wtl["Name"].str.strip().str.len().gt(0)]

#append ID and clean up the ids column
wtl["ids"] = wtl["ID"] "," wtl["ids"]
wtl["ids"] = wtl["ids"] = wtl["ids"].str.split("\s?,\s?")

#groupby and construct required dictionary
output = wtl.groupby("ID").apply(lambda x: dict(zip(x["Name"],x["ids"]))).to_dict()

>>> output
{'G-0001': {'aasd1': ['G-0001', '234', '34', '11223', '345', '345_2'],
            'vaasd1': ['G-0001', '2234', '354', '223', '35', '3435']},
 'G-0002': {'aasd2': ['G-0002', '2343', '355', '121', '34']},
 'G-00023': {'aabsd1': ['G-00023', '24', '23', '244', '2421', '567', '789']},
 'G-01034': {'aabsd2': ['G-01034', '87', '35', '67_1'],
             'averabsd': ['G-01034', '387', '355', '667_1']}}