Home > Software design >  Arranging a dataframe into multiple dictionaries based on unique values in a specific column
Arranging a dataframe into multiple dictionaries based on unique values in a specific column

Time:07-22

I'd like to create multiple dictionaries based on unique values in a column, in this case unique strings in the 'Name' column as a key. And other columns of my choice also being a key. So for my given dataset, the Name column will have the same name repeated numerous times, but the number of times that name repeats varies completely and so also the amount of information corresponding to that name in other columns such as source.

     Name  Notes  Source Number  Info
0    Bob    NaN     NaN   g:45   NaN
1  Billy    1.0    Home   B:67   NaN
2  Billy    1.0    Work    r:3   NaN
3  Billy    NaN     NaN   D:90   NaN
4  Billy    0.0  School    A:1   NaN
5   Eric    0.0     NaN   R:35   NaN
6   Eric    NaN    Home    f:3   NaN

I would like the final output dictionaries to read:

dict1 = {Name: Bob, Source:[Nan], Number: [g:45]}
dict2 = {Name: Billy, Source:[Home,Work,NaN,School], Number: [B:67,r:3,D:90,A:1}
dict3 = {Name: Eric, Source:[Nan, Home], Number: [R:35,f:3]}

The dataset is quite large and there may be NaN values in random locations hence the reason for including them. I would like to be able to always specify the keys going into the dictionaries (they will always be the same for each dictionary) and have the values filled in according to that name column. In the real dataset, there are much more names and columns etc so id like the code to be scalable to allow for this.

Any help on this problem is greatly appreciated!!!

CodePudding user response:

IIUC, we can generate a list of dictionaries:

df.groupby('Name')[['Source', 'Number']].agg(list).reset_index().to_dict('records')

Output:

[{'Name': 'Billy',
  'Source': ['Home', 'Work', nan, 'School'],
  'Number': ['B:67', 'r:3', 'D:90', 'A:1']},
 {'Name': 'Bob', 'Source': [nan], 'Number': ['g:45']},
 {'Name': 'Eric', 'Source': [nan, 'Home'], 'Number': ['R:35', 'f:3']}]
  • Related