I'd like to create multiple dictionaries based on unique values in a column, in this case unique strings in the 'Name' column as a key. And other columns of my choice also being a key. So for my given dataset, the Name column will have the same name repeated numerous times, but the number of times that name repeats varies completely and so also the amount of information corresponding to that name in other columns such as source.
Name Notes Source Number Info
0 Bob NaN NaN g:45 NaN
1 Billy 1.0 Home B:67 NaN
2 Billy 1.0 Work r:3 NaN
3 Billy NaN NaN D:90 NaN
4 Billy 0.0 School A:1 NaN
5 Eric 0.0 NaN R:35 NaN
6 Eric NaN Home f:3 NaN
I would like the final output dictionaries to read:
dict1 = {Name: Bob, Source:[Nan], Number: [g:45]}
dict2 = {Name: Billy, Source:[Home,Work,NaN,School], Number: [B:67,r:3,D:90,A:1}
dict3 = {Name: Eric, Source:[Nan, Home], Number: [R:35,f:3]}
The dataset is quite large and there may be NaN values in random locations hence the reason for including them. I would like to be able to always specify the keys going into the dictionaries (they will always be the same for each dictionary) and have the values filled in according to that name column. In the real dataset, there are much more names and columns etc so id like the code to be scalable to allow for this.
Any help on this problem is greatly appreciated!!!
CodePudding user response:
IIUC, we can generate a list of dictionaries:
df.groupby('Name')[['Source', 'Number']].agg(list).reset_index().to_dict('records')
Output:
[{'Name': 'Billy',
'Source': ['Home', 'Work', nan, 'School'],
'Number': ['B:67', 'r:3', 'D:90', 'A:1']},
{'Name': 'Bob', 'Source': [nan], 'Number': ['g:45']},
{'Name': 'Eric', 'Source': [nan, 'Home'], 'Number': ['R:35', 'f:3']}]