I have the following dataframe :
df Item ItemList
1 a r,t,y
2 b z,q,b
3 c c,d
4 b a
5 a a,z
and I want to check for each row if item is in itemList, and if so add a Flag like this :
df Item ItemList Flag
1 a r,t,y NaN
2 b z,q,b Yes
3 c c,d Yes
4 b a NaN
5 a a,z Yes
The only answer I found so far is this one:
Check if comma separated values in a dataframe contains values from another dataframe in python
But using isin() does not seem to do the trick row by row.
Thank you for your help !
CodePudding user response:
Test if exist values by splitted ItemList
in list comprehension with zip
and if-else
use for set Yes
or missing values:
df['Flag'] = ['Yes' if x in y.split(',')
else np.nan for x, y in zip(df['Item'], df['ItemList'])]
print (df)
df Item ItemList Flag
0 1 a r,t,y NaN
1 2 b z,q,b Yes
2 3 c c,d Yes
3 4 b a NaN
4 5 a a,z Yes
CodePudding user response:
I would do it following way
import pandas as pd
df = pd.DataFrame({"Item":["a","b","c","b","a"],"ItemList":["r,t,y","z,q,b","c,d","a","a,z"]})
df["Flag"] = df.apply(lambda row:"Yes" if row.Item in row.ItemList.split(",") else "NaN",axis=1)
print(df)
output
Item ItemList Flag
0 a r,t,y NaN
1 b z,q,b Yes
2 c c,d Yes
3 b a NaN
4 a a,z Yes
Explanation: .apply
with axis=1
apply function to each row, function (here lambda
) accept row as pandas.Series
CodePudding user response:
If you have many repeated items in Item
(ie. few groups), it might be advantageous to use groupby
:
df['Flag'] = np.where(df.groupby('Item')['ItemList']
.apply(lambda s: s.str.contains(s.name)),
'Yes', np.nan)
output:
Item ItemList Flag
0 a r,t,y NaN
1 b z,q,b Yes
2 c c,d Yes
3 b a NaN
4 a a,z Yes