I have excel files that contains two columns, I want to check presence of every cell in column 1 against data in column 2,
If data in a cell in column 1 is present in column 2 then it must output 1 and if not 0.
COLUMN 1 COLUMN 2
ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI
STEPHEN STAFFORD MIHUNGO PETER G. DATTAN
JUMANNE MWALIMU JOANES PETER LUGAZIA
HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA
AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI
KESSY BONIFAS FULANO RICHARD THOMAS MLIWA
KENEDY STEPHEN MSHOMI JUMANNE MWALIMU
JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM
MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA
PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO
COLUMN 1 COLUMN 2 RESULTS
ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI 0
STEPHEN STAFFORD MIHUNGO PETER G. DATTAN 1
JUMANNE MWALIMU JOANES PETER LUGAZIA 1
HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA 0
AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI 0
KESSY BONIFAS FULANO PETRO ZACHARIA MAGANGA 0
KENEDY STEPHEN MSHOMI JUMANNE MWALIMU 0
JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM 0
MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA 0
PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO 1
df['RESULTS'] = df['COLUMN 1'] isin df['COLUMN 2']
CodePudding user response:
You almost had it:
df["RESULTS"] = df["COLUMN 1"].isin(df["COLUMN 2"]).astype(int)
>>> df
COLUMN 1 COLUMN 2 RESULTS
0 ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI 0
1 STEPHEN STAFFORD MIHUNGO PETER G. DATTAN 1
2 JUMANNE MWALIMU JOANES PETER LUGAZIA 1
3 HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA 0
4 AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI 0
5 KESSY BONIFAS FULANO RICHARD THOMAS MLIWA 0
6 KENEDY STEPHEN MSHOMI JUMANNE MWALIMU 0
7 JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM 1
8 MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA 0
9 PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO 0
CodePudding user response:
Use np.where
import numpy as np
df["RESULTS"] = np.where(df["COLUMN 1"]==df["COLUMN 2"], 1, 0)
https://numpy.org/doc/stable/reference/generated/numpy.where.html