I have a dataframe that looks like sample DF below:
import pandas as pd
df = pd.DataFrame({'user_id' :
['aefc6',
'7edc3',
'6f85b',
'6f99b',
'6f85b'],
'user_name' : ['john', 'jack', 'jill', 'tom', 'jill'],
'Event' : ['A','B','C','D', 'E'],
'Collabs' : ['adsfkj', '6f85b,asdfad', 'adfad', '9b123', '101bv']
})
The scenario is the user_id
, user_name
columns are those of event managers. But some event managers are also collaborators for some other event as can be seen in the example of "jill" who was a collaborator for Event B.
The "collabs" column is a CSV column and I unnest it as below:
df['new'] = df['Collabs'].str.split(',')
df = df.explode('new')
df['exist'] = df['new'].map(lambda x : x in '|'.join(df['user_id']))
As mentioned above, we can see that 'jill' with user_id
6f85b is present in the Collab
column. My requirement is I need to get the user_id
and user_name
of those who appear in the new
column or where exist
column is True
. I.E I need to get the ids and names of managers who acted as collaborators for other events.
Expected output:
user_id user_name Event Collabs new exist collab_user_id collab_user_name
0 aefc6 john A adsfkj adsfkj FALSE NaN NaN
1 7edc3 jack B 6f85b,asdfad 6f85b TRUE 6f85b jill
1 7edc3 jack B 6f85b,asdfad asdfad FALSE NaN NaN
2 6f85b jill C adfad adfad FALSE NaN NaN
3 6f99b tom D 9b123 9b123 FALSE NaN NaN
4 6f85b jill E 101bv 101bv FALSE NaN NaN
I tried joining:
pd.merge(df, df.loc[:,['user_name', 'new']], left_on = 'user_id', right_on = 'new', how = 'left')
But instead of "jill" the output (understandably) is having "jack". Could someone please let me know how to get the desired output.
CodePudding user response:
You can use isin
to create exist
column then use np.where
to create collab_user_id
and map the user_name
to collab_user_name
df['exist'] = df['new'].isin(df['user_id'])
df['collab_user_id'] = np.where(df['exist'], df['new'], np.nan)
df['collab_user_name'] = df['collab_user_id'].map(dict(zip(df['user_id'], df['user_name'])))
user_id user_name Event Collabs new exist collab_user_id collab_user_name
0 aefc6 john A adsfkj adsfkj False NaN NaN
1 7edc3 jack B 6f85b,asdfad 6f85b True 6f85b jill
1 7edc3 jack B 6f85b,asdfad asdfad False NaN NaN
2 6f85b jill C adfad adfad False NaN NaN
3 6f99b tom D 9b123 9b123 False NaN NaN
4 6f85b jill E 101bv 101bv False NaN NaN