I am trying to create a nested dictionary from a pandas data frame.
I have this data frame:
id1 ids1 Name1 Name2 ids2 ID col1 Goal col2 col3
0 85643 234,34,11223,345,345_2 aasd1 vaasd1 2234,354,223,35,3435 G-0001 1 NaN 3 1
1 85644 2343,355,121,34 aasd2 G-0002 2 56.0000 4 22
2 8564312 24 , 23 ,244 ,2421 ,567 ,789 aabsd1 G-00023 3 NaN 32 33
3 8564314 87 ,35 ,67_1 aabsd2 averabsd 387 ,355 ,667_1 G-01034 4 89.0000 43 44
df.to_dict()
#Here is wht you requested
{'id1 ': {0: 85643, 1: 85644, 2: 8564312, 3: 8564314},
'ids1 ': {0: '234,34,11223,345,345_2 ',
1: '2343,355,121,34 ',
2: '24 , 23 ,244 ,2421 ,567 ,789',
3: '87 ,35 ,67_1 '},
'Name1': {0: 'aasd1 ', 1: 'aasd2 ', 2: 'aabsd1', 3: 'aabsd2'},
'Name2': {0: 'vaasd1 ', 1: ' ', 2: ' ', 3: 'averabsd'},
'ids2': {0: '2234,354,223,35,3435',
1: ' ',
2: ' ',
3: ' 387 ,355 ,667_1 '},
'ID': {0: 'G-0001 ', 1: 'G-0002 ', 2: 'G-00023', 3: 'G-01034'},
'col1': {0: 1, 1: 2, 2: 3, 3: 4},
'Goal ': {0: ' NaN ', 1: 56, 2: ' NaN ', 3: 89},
'col2': {0: 3, 1: 4, 2: 32, 3: 43},
'col3': {0: 1, 1: 22, 2: 33, 3: 44}}
Each row in the "ID" column needs to be the key. inside that dictionary, the 'Name1' column and the 'Name2' columns need to be there as a list. 'Name1' column list is associated with the "ids1" column and the 'Name2' column list is associated with the "ids2" column. I also need to put the "ID" column name inside that list too.
So I want to create a nested dictionary-like below.
mapper={
"G-0001":{"aasd1":['G-0001','234','34','11223','345','345_2'],
"vaasd1":['G-0001','2234','354','223','35','3435']},
"G-0002":{"aasd2":['G-0002','2343','355','121','34']},
"G-00023":{"aabsd1":['G-00023','24' , '23' ,'244' ,'2421' ,'567' ,'789']},
"G-01034":{"aabsd2":['G-01034','87' ,'35' ,'67_1'],
"averabsd":['G-01034','387' ,'355' ,'667_1']}
}
Is it possible to create that? Can someone give me an idea, please? Anything is appreciated. Thanks in advance!
CodePudding user response:
Try:
- Convert DataFrame from wide to long format
- Drop rows without "Name" and append "ID" to "ids"
groupby
and construct the required output dictionary.
#remove extra spaces from column names
df.columns = df.columns.str.strip()
#assign and index and convert DataFrame from wide to long format
df["idx"] = df.index
wtl = pd.wide_to_long(df, ["Name","ids"], "idx","j")
#drop rows without Name
wtl = wtl[wtl["Name"].str.strip().str.len().gt(0)]
#append ID and clean up the ids column
wtl["ids"] = wtl["ID"] "," wtl["ids"]
wtl["ids"] = wtl["ids"] = wtl["ids"].str.split("\s?,\s?")
#groupby and construct required dictionary
output = wtl.groupby("ID").apply(lambda x: dict(zip(x["Name"],x["ids"]))).to_dict()
>>> output
{'G-0001': {'aasd1': ['G-0001', '234', '34', '11223', '345', '345_2'],
'vaasd1': ['G-0001', '2234', '354', '223', '35', '3435']},
'G-0002': {'aasd2': ['G-0002', '2343', '355', '121', '34']},
'G-00023': {'aabsd1': ['G-00023', '24', '23', '244', '2421', '567', '789']},
'G-01034': {'aabsd2': ['G-01034', '87', '35', '67_1'],
'averabsd': ['G-01034', '387', '355', '667_1']}}