Pandas: Looking to avoid a for loop when creating a nested dictionary-CodePudding

Here is my data:

df:

id sub_id
A  1
A  2
B  3
B  4

and I have the following array:

[[1,2],
[2,5],
[1,4],
[7,8]]

Here is my code:

from collections import defaultdict

sub_id_array_dict = defaultdict(dict)
for i, s, a in zip(df['id'].to_list(), df['sub_id'].to_list(), arrays):
    sub_id_array_dict[i][s] = a

Now, my actual dataframe includes a total of 100M rows (unique sub_id) with 500K unique ids. Ideally, I'd like to avoid a for loop.

Any help would be much appreciated.

CodePudding user response：

Assuming the arrays variable has same number of rows as in the Dataframe,

df['value'] = arrays

Convert into dictionary by grouping

df.groupby('id').apply(lambda x: dict(zip(x.sub_id, x.value))).to_dict()

Output

{'A': {1: [1, 2], 2: [2, 5]}, 'B': {3: [1, 4], 4: [7, 8]}}

CodePudding user response：

You can assign arrays to a column and then use pivot:

df['value'] = arrays
out = df.pivot('sub_id','id','value').to_dict()

Output:

{'A': {1: [1, 2], 2: [2, 5], 3: nan, 4: nan},
 'B': {1: nan, 2: nan, 3: [1, 4], 4: [7, 8]}}

If you want to get rid of NaNs:

new_out = {key: {k:v for k,v in val.items() if v is not np.nan} for key, val in out.items()}

Output:

{'A': {1: [1, 2], 2: [2, 5]}, 'B': {3: [1, 4], 4: [7, 8]}}