Converting a dataframe into dict values with duplicate keys-CodePudding

I have a data frame like this

   col1   col2    col3  col4   action_id
0   1      2        2     0       a, apple
1   1      2        3     5       b, apple
2   0.2   0.3       8     1       c, apple
3   0.2   0.02      1     2       a, apple
4   11     11       22    11      b, apple

I want to convert this data frame into dict with action_id as my key and others as my values.

I want my output in this manner:

{('a', 'apple'): array([[1, 2, 2, 0]]),
('b', 'apple'): array([[1, 2, 3, 5]]),
('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}

I have tried this method

data2d = var.set_index('action_id').T.to_dict('list') considering var as my dataframe.

But this method is overwriting the values in dict with the duplicate keys and only returns me the last values from the duplicate key. Is there any way I can get duplicate keys also with different values?

{('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}

CodePudding user response：

It is impossible to have duplicated keys in a python dictionary.

If you want, you can aggregate at the list/array level:

var.set_index('action_id').groupby(level=0).agg(list).T.to_dict('list')

Output:

{('a', 'apple'): [[1.0, 0.2], [2.0, 0.02], [2, 1], [0, 2]],
 ('b', 'apple'): [[1.0, 11.0], [2.0, 11.0], [3, 22], [5, 11]],
 ('c', 'apple'): [[0.2], [0.3], [8], [1]]}

Or:

var.set_index('action_id').groupby(level=0).apply(lambda g: g.to_numpy()).to_dict()

Output:

{('a', 'apple'): array([[1.  , 2.  , 2.  , 0.  ],
                        [0.2 , 0.02, 1.  , 2.  ]]),
 ('b', 'apple'): array([[ 1.,  2.,  3.,  5.],
                        [11., 11., 22., 11.]]),
 ('c', 'apple'): array([[0.2, 0.3, 8. , 1. ]])}

CodePudding user response：

k=df1.action_id.str.split(",").map(tuple)
v=df1.loc[:,:'col4'].apply(lambda ss:ss.to_numpy(),axis=1)
dict(zip(k,v))
    
 out：
    
{('a', ' apple'): array([0.2 , 0.02, 1.  , 2.  ]),
 ('b', ' apple'): array([11., 11., 22., 11.]),
 ('c', ' apple'): array([0.2, 0.3, 8. , 1. ])}