I have a large dataframe, here's an extract:
Index | Protocol ID | Activity ID | Detail 1 | Detail 2 |
---|---|---|---|---|
0 | 1509 | 15 | a | s |
1 | 636 | 159 | b | t |
2 | 787 | 159 | c | u |
3 | 796 | 159 | d | v |
4 | 1174 | 159 | e | w |
5 | 787 | 252 | f | x |
6 | 1029 | 252 | g | y |
7 | 1188 | NaN | h | z |
8 | 1848 | NaN | i | a |
9 | TBD | NaN | j | b |
10 | TBD | NaN | k | c |
11 | 1029 | 253 | l | d |
12 | 1170 | 253 | m | e |
13 | 1468 | NaN | n | f |
14 | 957 | NaN | o | g |
15 | 1029 | 254 | p | h |
16 | 957 | 254 | q | i |
17 | 841 | 166 | r | j |
I need to create a dictionary of dictionaries named by Activity ID. In each "activity dictionary", I need all the dataframes named by the protocol ID and in each of the protocol dataframe, I need the complete informations from all the initial columns.
So, dictionary of activities → In each one, dictionary of their protocols → In each one, dataframe of all the information linked to the protocol
When I code
activity_dict = initial_dataframe.set_index('Activity ID').T.to_dict('dict')
It does create me a dictionary named by activity ID but when I click on an activity, it only shows the last of the protocols and the information linked to it. Since this is basically doing the last step I need, I'm trying to add the intermediate step which is the protocols dataframes which should appear when I click on an activity dictionary or run for example:
activity_dict["159"]
Which right now is showing me
{'Protocol ID': '1174',
'Protocol Name': 'My analyses',
'I/O': 'O',
'Prot Owner': 'lorem.ipsum',
'Notes': 'Done',
'Activity Name': 'Test',
'Comments': nan,
'Activity Owner': nan,
'Protocol description': nan,
'Fonte': nan}
When I would like for it to show me a link to the dataframes linked to not only protocol 1174, but also 636 and 787.
Does anybody know how to do this?
Thank you in advance
CodePudding user response:
Because dictionary has unique keys one possible solution is create lists for all values:
activity_dict = initial_dataframe.groupby('Activity ID').agg(list).to_dict('index')
print (activity_dict)
{'15': {'Protocol ID': ['1509'], 'Detail 1': ['a'], 'Detail 2': ['s']},
'159': {'Protocol ID': ['636', '787', '796', '1174'], 'Detail 1': ['b', 'c', 'd', 'e'], 'Detail 2': ['t', 'u', 'v', 'w']},
'166': {'Protocol ID': ['841'], 'Detail 1': ['r'], 'Detail 2': ['j']},
'252': {'Protocol ID': ['787', '1029'], 'Detail 1': ['f', 'g'], 'Detail 2': ['x', 'y']},
'253': {'Protocol ID': ['1029', '1170'], 'Detail 1': ['l', 'm'], 'Detail 2': ['d', 'e']},
'254': {'Protocol ID': ['1029', '957'], 'Detail 1': ['p', 'q'], 'Detail 2': ['h', 'i']}}
Or use join:
activity_dict = initial_dataframe.groupby('Activity ID').agg(','.join).to_dict('index')
print (activity_dict)
{'15': {'Protocol ID': '1509', 'Detail 1': 'a', 'Detail 2': 's'},
'159': {'Protocol ID': '636,787,796,1174', 'Detail 1': 'b,c,d,e', 'Detail 2': 't,u,v,w'},
'166': {'Protocol ID': '841', 'Detail 1': 'r', 'Detail 2': 'j'},
'252': {'Protocol ID': '787,1029', 'Detail 1': 'f,g', 'Detail 2': 'x,y'},
'253': {'Protocol ID': '1029,1170', 'Detail 1': 'l,m', 'Detail 2': 'd,e'},
'254': {'Protocol ID': '1029,957', 'Detail 1': 'p,q', 'Detail 2': 'h,i'}}