Home > Software design >  Pandas dataframe: Flag a row when column value is in comma-separated string column
Pandas dataframe: Flag a row when column value is in comma-separated string column

Time:05-10

I have the following dataframe :

df   Item    ItemList   
1      a       r,t,y     
2      b       z,q,b     
3      c         c,d     
4      b           a     
5      a         a,z  

and I want to check for each row if item is in itemList, and if so add a Flag like this :

df   Item    ItemList   Flag
1      a       r,t,y     NaN
2      b       z,q,b     Yes
3      c         c,d     Yes
4      b           a     NaN
5      a         a,z     Yes

The only answer I found so far is this one:

Check if comma separated values in a dataframe contains values from another dataframe in python

But using isin() does not seem to do the trick row by row.

Thank you for your help !

CodePudding user response:

Test if exist values by splitted ItemList in list comprehension with zip and if-else use for set Yes or missing values:

df['Flag'] = ['Yes' if x in y.split(',') 
                     else np.nan for x, y in zip(df['Item'], df['ItemList'])]

print (df)
   df Item ItemList Flag
0   1    a    r,t,y  NaN
1   2    b    z,q,b  Yes
2   3    c      c,d  Yes
3   4    b        a  NaN
4   5    a      a,z  Yes

CodePudding user response:

I would do it following way

import pandas as pd
df = pd.DataFrame({"Item":["a","b","c","b","a"],"ItemList":["r,t,y","z,q,b","c,d","a","a,z"]})
df["Flag"] = df.apply(lambda row:"Yes" if row.Item in row.ItemList.split(",") else "NaN",axis=1)
print(df)

output

  Item ItemList Flag
0    a    r,t,y  NaN
1    b    z,q,b  Yes
2    c      c,d  Yes
3    b        a  NaN
4    a      a,z  Yes

Explanation: .apply with axis=1 apply function to each row, function (here lambda) accept row as pandas.Series

CodePudding user response:

If you have many repeated items in Item (ie. few groups), it might be advantageous to use groupby:

df['Flag'] = np.where(df.groupby('Item')['ItemList']
                        .apply(lambda s: s.str.contains(s.name)),
                      'Yes', np.nan)

output:

  Item ItemList Flag
0    a    r,t,y  NaN
1    b    z,q,b  Yes
2    c      c,d  Yes
3    b        a  NaN
4    a      a,z  Yes

  • Related