Hi I am trying to convert a dataframe into a dictionary using python. The dataframe is similar as:
data_quim = {'Type':[1,2,3],'BCN':['X',np.nan,np.nan],'MAD':[np.nan,'X',np.nan],'MAT':['X',np.nan,'X']}
df = pd.DataFrame(data_quim)
df
Type BCN MAD MAT
0 1 X NaN X
1 2 NaN X NaN
2 3 NaN NaN X
I want to get a dictionary with this form:
{1:['BCN','MAT],
2:['MAD],
3:['MAT]}
I have tried to function to_dict(), with setting first 'Type' to index, but cannot get exactly what I want. Any thoughts?
CodePudding user response:
You could use:
(df.set_index('Type').stack()
.reset_index(1)
['level_1'].groupby(level=0)
.apply(list)
.to_dict()
)
output:
{1: ['BCN', 'MAT'], 2: ['MAD'], 3: ['MAT']}
Or, using apply
:
(df.set_index('Type')
.apply(lambda s: list(s.dropna().index), axis=1)
.to_dict()
)
CodePudding user response:
I really like @mozway's second answer for this specific problem but apply
sometimes isn't fastest. Since pandas is so flexible, you could also do this with pd.melt
which is a very flexible way for transforming wide data to narrow data and lean on groupby
. This would be more helpful if you need to do something with more conditions/processing before putting it in the list (what if you have 'X','Y','NaN', and want to do something different with 'X' vs 'Y'):
df.melt(id_vars='Type')\
.dropna()\
.groupby('Type')\
.agg({'variable': list})\
.to_dict()['variable']
Or if you wanted to get out of pandas as quick as possible for some reason:
{
row[0]: [
label
for label, indicator in zip(df.columns[1:], row[1:])
if indicator == 'X'
]
for row in df.values
}
The big picture is that this is a good problem for mapping the original wide table to a key-value pair for each 'X', and then reducing by key all of the values into a list. Since your rows are already grouped by key, I really like @mozway's second suggestion. My second suggestion is a way to do it outside of pandas that would also work well if you instead needed to work with a streaming iterator for each row.