Create a nested dictionary using columns and row names as keys with dictionary comprehension-CodePudding

Context: I have the following dataframe:

                    gene_id  Control_3Aligned.sortedByCoord.out.gtf  Control_4Aligned.sortedByCoord.out.gtf  ...  NET_101Aligned.sortedByCoord.out.gtf  NET_103Aligned.sortedByCoord.out.gtf  NET_105Aligned.sortedByCoord.out.gtf
0  ENSG00000213279|Z97192.2                                       0                                       0  ...                                     3                                     2                                     7     
1     ENSG00000132680|KHDC4                                     625                                     382  ...                                   406                                   465                                   262     
2     ENSG00000145041|DCAF1                                     423                                     104  ...                                   231                                   475                                   254     
3    ENSG00000102547|CAB39L                                     370                                     112  ...                                   265                                   393                                   389     
4     ENSG00000173826|KCNH6                                       0                                       0  ...                                     0                                     0                                     0

And I'd like to get a nested dictionary as this example:

   {Control_3Aligned.sortedByCoord.out.gtf: 
             {ENSG00000213279|Z97192.2:0, 
              ENSG00000132680|KHDC4:625,...},
    Control_4Aligned.sortedByCoord.out.gtf: 
             {ENSG00000213279|Z97192.2:0, 
              ENSG00000132680|KHDC4:382,...}}

So the general format would be:

{column_name : {row_name:value,...},...}

I was trying something like this:

sample_dict ={}

for column in df.columns[1:]:
    for index in range(0,len(df.index) 1):
        sample_dict.setdefault(column, {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])})
        sample_dict[column]  = {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])}

But I keep getting TypeError: 'numpy.int64' object is not iterable (the problem seems to be in the zip() as zip only takes iterables and I'm not really doing that in this example and most certainly in the way I'm populating the dictionary as well)

Any help is very welcome! Thank you in advance

CodePudding user response：

Managed to do it like this:

sample_dict ={}
gene_list = []
for index in range(0,len(df.index)):
    temp_data = df.loc[index,'gene_id']
    gene_list.append(temp_data)

for column in df.columns[1:]:
    column_list = df.loc[:,column]
    gene_dict = {}
    for index in range(0,len(df.index)):
        if gene_list[index] not in gene_dict:
            gene_dict[gene_list[index]]=df.loc[index,column]
    sample_dict[column] = gene_dict

sample_dict.items()

dict_pairs = sample_dict.items()
pairs_iterator = iter(dict_pairs)
first_pair = next(pairs_iterator)
first_pair