Home > Back-end >  Pandas dataframes in a dictionary itself in a dictionary
Pandas dataframes in a dictionary itself in a dictionary

Time:06-22

I have a large dataframe, here's an extract:

Index Protocol ID Activity ID Detail 1 Detail 2
0 1509 15 a s
1 636 159 b t
2 787 159 c u
3 796 159 d v
4 1174 159 e w
5 787 252 f x
6 1029 252 g y
7 1188 NaN h z
8 1848 NaN i a
9 TBD NaN j b
10 TBD NaN k c
11 1029 253 l d
12 1170 253 m e
13 1468 NaN n f
14 957 NaN o g
15 1029 254 p h
16 957 254 q i
17 841 166 r j

I need to create a dictionary of dictionaries named by Activity ID. In each "activity dictionary", I need all the dataframes named by the protocol ID and in each of the protocol dataframe, I need the complete informations from all the initial columns.

So, dictionary of activities → In each one, dictionary of their protocols → In each one, dataframe of all the information linked to the protocol

When I code

activity_dict = initial_dataframe.set_index('Activity ID').T.to_dict('dict')

It does create me a dictionary named by activity ID but when I click on an activity, it only shows the last of the protocols and the information linked to it. Since this is basically doing the last step I need, I'm trying to add the intermediate step which is the protocols dataframes which should appear when I click on an activity dictionary or run for example:

activity_dict["159"]

Which right now is showing me

{'Protocol ID': '1174',
 'Protocol Name': 'My analyses',
 'I/O': 'O',
 'Prot Owner': 'lorem.ipsum',
 'Notes': 'Done',
 'Activity Name': 'Test',
 'Comments': nan,
 'Activity Owner': nan,
 'Protocol description': nan,
 'Fonte': nan}  

When I would like for it to show me a link to the dataframes linked to not only protocol 1174, but also 636 and 787.

Does anybody know how to do this?

Thank you in advance

CodePudding user response:

Because dictionary has unique keys one possible solution is create lists for all values:

activity_dict = initial_dataframe.groupby('Activity ID').agg(list).to_dict('index')
print (activity_dict)

{'15': {'Protocol ID': ['1509'], 'Detail 1': ['a'], 'Detail 2': ['s']}, 
 '159': {'Protocol ID': ['636', '787', '796', '1174'], 'Detail 1': ['b', 'c', 'd', 'e'], 'Detail 2': ['t', 'u', 'v', 'w']}, 
 '166': {'Protocol ID': ['841'], 'Detail 1': ['r'], 'Detail 2': ['j']}, 
 '252': {'Protocol ID': ['787', '1029'], 'Detail 1': ['f', 'g'], 'Detail 2': ['x', 'y']}, 
 '253': {'Protocol ID': ['1029', '1170'], 'Detail 1': ['l', 'm'], 'Detail 2': ['d', 'e']}, 
 '254': {'Protocol ID': ['1029', '957'], 'Detail 1': ['p', 'q'], 'Detail 2': ['h', 'i']}}

Or use join:

activity_dict = initial_dataframe.groupby('Activity ID').agg(','.join).to_dict('index')
print (activity_dict)

{'15': {'Protocol ID': '1509', 'Detail 1': 'a', 'Detail 2': 's'}, 
 '159': {'Protocol ID': '636,787,796,1174', 'Detail 1': 'b,c,d,e', 'Detail 2': 't,u,v,w'}, 
 '166': {'Protocol ID': '841', 'Detail 1': 'r', 'Detail 2': 'j'}, 
 '252': {'Protocol ID': '787,1029', 'Detail 1': 'f,g', 'Detail 2': 'x,y'}, 
 '253': {'Protocol ID': '1029,1170', 'Detail 1': 'l,m', 'Detail 2': 'd,e'}, 
 '254': {'Protocol ID': '1029,957', 'Detail 1': 'p,q', 'Detail 2': 'h,i'}}
  • Related