I have the following df:
df = pd.DataFrame({"call 1": ['debit card','bond',np.nan],
"call 2": ['credit card','mortgage','spending limit'],
"call 3":['payment limit',np.nan,np.nan]})
which is:
call 1 call 2 call 3
0 debit card credit card payment limit
1 bond mortgage NaN
2 NaN spending limit NaN
I've further done some clustering and produce a new df as:
dfc = pd.DataFrame( {'cluster 1': ['payment limit', 'spending limit'],
'cluster 2': ['debit card', 'credit card'],
'cluster 3': [ 'bond', 'mortgage']})
as
cluster 1 cluster 2 cluster 3
0 payment limit debit card bond
1 spending limit credit card mortgage
Now in dfc
I want to know where each word is coming from for example payment limit
is originally from call 3
etc. In fact I wonder how to make a new df from these two dataframes such that I have:
print(pd.DataFrame( {'cluster 1': [{'call 3': 'payment limit'}, {'call 2':'spending limit'}],
'cluster 2': [{'call 1':'debit card'}, {'call 2':'credit card'}],
'cluster 3': [ {'call 1':'bond'}, {'call 2':'mortgage'}]}))
CodePudding user response:
dfc.applymap(lambda x: df[df.eq(x)].dropna(how='all').dropna(axis=1).to_dict('records')[0])
Output:
cluster 1 cluster 2 cluster 3
0 {'call 3': 'payment limit'} {'call 1': 'debit card'} {'call 1': 'bond'}
1 {'call 2': 'spending limit'} {'call 2': 'credit card'} {'call 2': 'mortgage'}
CodePudding user response:
We can create a lookup dictionary and add the key:value from our first dataframe. For the second dataframe we replace the values if the same is found in our lookup dictionary
lookup_dict = {}
look_df = df.T
for col in look_df.columns:
lookup_dict.update(dict(zip(look_df[col], look_df.index)))
pd.concat([dfc.replace(lookup_dict), dfc]).astype(str).groupby(level=0).agg(tuple)
Output :
This gives us :
cluster 1 cluster 2 cluster 3
0 (call 3, payment limit) (call 1, debit card) (call 1, bond)
1 (call 2, spending limit) (call 2, credit card) (call 2, mortgage)