Home > OS >  How to get values from a column based on values in another column in pandas
How to get values from a column based on values in another column in pandas

Time:05-05

I have a dataframe that looks like sample DF below:

import pandas as pd
df = pd.DataFrame({'user_id' : 
                   ['aefc6',         
                    '7edc3',
                    '6f85b',
                    '6f99b',
                    '6f85b'],
                   'user_name' : ['john', 'jack', 'jill', 'tom', 'jill'],
                   'Event' : ['A','B','C','D', 'E'],
    'Collabs' : ['adsfkj', '6f85b,asdfad', 'adfad', '9b123', '101bv']
})

The scenario is the user_id, user_name columns are those of event managers. But some event managers are also collaborators for some other event as can be seen in the example of "jill" who was a collaborator for Event B.

The "collabs" column is a CSV column and I unnest it as below:

df['new'] = df['Collabs'].str.split(',')
df = df.explode('new')
df['exist'] = df['new'].map(lambda x : x in '|'.join(df['user_id']))

As mentioned above, we can see that 'jill' with user_id 6f85b is present in the Collab column. My requirement is I need to get the user_id and user_name of those who appear in the new column or where exist column is True. I.E I need to get the ids and names of managers who acted as collaborators for other events.

Expected output:

    user_id user_name   Event    Collabs        new     exist   collab_user_id  collab_user_name
0   aefc6   john        A        adsfkj         adsfkj  FALSE   NaN             NaN
1   7edc3   jack        B        6f85b,asdfad   6f85b   TRUE    6f85b           jill
1   7edc3   jack        B        6f85b,asdfad   asdfad  FALSE   NaN             NaN
2   6f85b   jill        C        adfad          adfad   FALSE   NaN             NaN
3   6f99b   tom         D        9b123          9b123   FALSE   NaN             NaN
4   6f85b   jill        E        101bv          101bv   FALSE   NaN             NaN

I tried joining:

pd.merge(df, df.loc[:,['user_name', 'new']], left_on = 'user_id', right_on = 'new', how = 'left')

But instead of "jill" the output (understandably) is having "jack". Could someone please let me know how to get the desired output.

CodePudding user response:

You can use isin to create exist column then use np.where to create collab_user_id and map the user_name to collab_user_name

df['exist'] = df['new'].isin(df['user_id'])
df['collab_user_id'] = np.where(df['exist'], df['new'], np.nan)
df['collab_user_name'] = df['collab_user_id'].map(dict(zip(df['user_id'], df['user_name'])))
  user_id user_name Event       Collabs     new  exist collab_user_id collab_user_name
0   aefc6      john     A        adsfkj  adsfkj  False            NaN              NaN
1   7edc3      jack     B  6f85b,asdfad   6f85b   True          6f85b             jill
1   7edc3      jack     B  6f85b,asdfad  asdfad  False            NaN              NaN
2   6f85b      jill     C         adfad   adfad  False            NaN              NaN
3   6f99b       tom     D         9b123   9b123  False            NaN              NaN
4   6f85b      jill     E         101bv   101bv  False            NaN              NaN
  • Related