I have two DFs:
df1:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
df2:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple 22.1 Red
2013-11-25 Orange 8.6 Orange
Now I would like to compare the two dfs and put a column 'True' in df2 when the color column of df2 is residing in df1.
desired output:
Date Fruit Num Color Match
2013-11-24 Banana 22.1 Yellow True
2013-11-24 Orange 8.6 Orange True
2013-11-24 Apple 7.6 Green True
2013-11-24 Celery 10.2 Green True
2013-11-25 Apple 22.1 Red False
2013-11-25 Orange 8.6 Orange True
I came up with the following:
df2['Match'] = np.where(df2['Match'] == df1, True, False)
However got the following error:
ValueError: Can only compare identically-labeled Series objects
And tried the following
flat_user_data['Match'] = np.where(df2['Color'].isin(df1['Color']), True, False)
ValueError: Length of values (5) does not match length of index (10798)
CodePudding user response:
IIUC, Series.isin
:
df2['Match'] = df2['Color'].isin(df1['Color'])
Or np.isin
:
df2['Match'] = np.isin(df2['Color'], df1['Color'])
CodePudding user response:
In addition to what @ansev suggested, you can achieve this goal by:
checkList = list(df1["Color"])
df2['Match'] = [True if x in checkList else False for x in df2["Color"]]
df2
Output
Date | Fruit | Num | Color | Match | |
---|---|---|---|---|---|
0 | 2013-11-24 | Banana | 22.1 | Yellow | True |
1 | 2013-11-24 | Orange | 8.6 | Orange | True |
2 | 2013-11-24 | Apple | 7.6 | Green | True |
3 | 2013-11-24 | Celery | 10.2 | Green | True |
4 | 2013-11-25 | Apple | 22.1 | Red | False |
5 | 2013-11-25 | Orange | 8.6 | Orange | True |
Note that you can use [x in checkList for x in df2["Color"]]
instead of [True if x in checkList else False for x in df2["Color"]]
(Thanks to @ansev for the comment)