I have a data frame like this
col1 col2 col3 col4 action_id
0 1 2 2 0 a, apple
1 1 2 3 5 b, apple
2 0.2 0.3 8 1 c, apple
3 0.2 0.02 1 2 a, apple
4 11 11 22 11 b, apple
I want to convert this data frame into dict with action_id as my key and others as my values.
I want my output in this manner:
{('a', 'apple'): array([[1, 2, 2, 0]]),
('b', 'apple'): array([[1, 2, 3, 5]]),
('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}
I have tried this method
data2d = var.set_index('action_id').T.to_dict('list')
considering var
as my dataframe.
But this method is overwriting the values in dict with the duplicate keys and only returns me the last values from the duplicate key. Is there any way I can get duplicate keys also with different values?
{('c', 'apple'): array([[0.2, 0.3, 8, 1]]),
('a', 'apple'): array([[0.2, 0.02, 1, 2]]),
('b', 'apple'): array([[11, 11, 22, 11]])}
CodePudding user response:
It is impossible to have duplicated keys in a python dictionary.
If you want, you can aggregate at the list/array level:
var.set_index('action_id').groupby(level=0).agg(list).T.to_dict('list')
Output:
{('a', 'apple'): [[1.0, 0.2], [2.0, 0.02], [2, 1], [0, 2]],
('b', 'apple'): [[1.0, 11.0], [2.0, 11.0], [3, 22], [5, 11]],
('c', 'apple'): [[0.2], [0.3], [8], [1]]}
Or:
var.set_index('action_id').groupby(level=0).apply(lambda g: g.to_numpy()).to_dict()
Output:
{('a', 'apple'): array([[1. , 2. , 2. , 0. ],
[0.2 , 0.02, 1. , 2. ]]),
('b', 'apple'): array([[ 1., 2., 3., 5.],
[11., 11., 22., 11.]]),
('c', 'apple'): array([[0.2, 0.3, 8. , 1. ]])}
CodePudding user response:
k=df1.action_id.str.split(",").map(tuple)
v=df1.loc[:,:'col4'].apply(lambda ss:ss.to_numpy(),axis=1)
dict(zip(k,v))
out:
{('a', ' apple'): array([0.2 , 0.02, 1. , 2. ]),
('b', ' apple'): array([11., 11., 22., 11.]),
('c', ' apple'): array([0.2, 0.3, 8. , 1. ])}