Context: I have the following dataframe:
gene_id Control_3Aligned.sortedByCoord.out.gtf Control_4Aligned.sortedByCoord.out.gtf ... NET_101Aligned.sortedByCoord.out.gtf NET_103Aligned.sortedByCoord.out.gtf NET_105Aligned.sortedByCoord.out.gtf
0 ENSG00000213279|Z97192.2 0 0 ... 3 2 7
1 ENSG00000132680|KHDC4 625 382 ... 406 465 262
2 ENSG00000145041|DCAF1 423 104 ... 231 475 254
3 ENSG00000102547|CAB39L 370 112 ... 265 393 389
4 ENSG00000173826|KCNH6 0 0 ... 0 0 0
And I'd like to get a nested dictionary as this example:
{Control_3Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:625,...},
Control_4Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:382,...}}
So the general format would be:
{column_name : {row_name:value,...},...}
I was trying something like this:
sample_dict ={}
for column in df.columns[1:]:
for index in range(0,len(df.index) 1):
sample_dict.setdefault(column, {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])})
sample_dict[column] = {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])}
But I keep getting TypeError: 'numpy.int64' object is not iterable
(the problem seems to be in the zip() as zip only takes iterables and I'm not really doing that in this example and most certainly in the way I'm populating the dictionary as well)
Any help is very welcome! Thank you in advance
CodePudding user response:
Managed to do it like this:
sample_dict ={}
gene_list = []
for index in range(0,len(df.index)):
temp_data = df.loc[index,'gene_id']
gene_list.append(temp_data)
for column in df.columns[1:]:
column_list = df.loc[:,column]
gene_dict = {}
for index in range(0,len(df.index)):
if gene_list[index] not in gene_dict:
gene_dict[gene_list[index]]=df.loc[index,column]
sample_dict[column] = gene_dict
sample_dict.items()
dict_pairs = sample_dict.items()
pairs_iterator = iter(dict_pairs)
first_pair = next(pairs_iterator)
first_pair