Pandas dataframe: Flag a row when column value is in comma-separated string column-CodePudding

I have the following dataframe :

df   Item    ItemList   
1      a       r,t,y     
2      b       z,q,b     
3      c         c,d     
4      b           a     
5      a         a,z

and I want to check for each row if item is in itemList, and if so add a Flag like this :

df   Item    ItemList   Flag
1      a       r,t,y     NaN
2      b       z,q,b     Yes
3      c         c,d     Yes
4      b           a     NaN
5      a         a,z     Yes

The only answer I found so far is this one:

Check if comma separated values in a dataframe contains values from another dataframe in python

But using isin() does not seem to do the trick row by row.

Thank you for your help !

CodePudding user response：

Test if exist values by splitted ItemList in list comprehension with zip and if-else use for set Yes or missing values:

df['Flag'] = ['Yes' if x in y.split(',') 
                     else np.nan for x, y in zip(df['Item'], df['ItemList'])]

print (df)
   df Item ItemList Flag
0   1    a    r,t,y  NaN
1   2    b    z,q,b  Yes
2   3    c      c,d  Yes
3   4    b        a  NaN
4   5    a      a,z  Yes

CodePudding user response：

I would do it following way

import pandas as pd
df = pd.DataFrame({"Item":["a","b","c","b","a"],"ItemList":["r,t,y","z,q,b","c,d","a","a,z"]})
df["Flag"] = df.apply(lambda row:"Yes" if row.Item in row.ItemList.split(",") else "NaN",axis=1)
print(df)

output

  Item ItemList Flag
0    a    r,t,y  NaN
1    b    z,q,b  Yes
2    c      c,d  Yes
3    b        a  NaN
4    a      a,z  Yes

Explanation: .apply with axis=1 apply function to each row, function (here lambda) accept row as pandas.Series

CodePudding user response：

If you have many repeated items in Item (ie. few groups), it might be advantageous to use groupby:

df['Flag'] = np.where(df.groupby('Item')['ItemList']
                        .apply(lambda s: s.str.contains(s.name)),
                      'Yes', np.nan)

output:

  Item ItemList Flag
0    a    r,t,y  NaN
1    b    z,q,b  Yes
2    c      c,d  Yes
3    b        a  NaN
4    a      a,z  Yes