I have below code in Python but I need to convert this to pyspark,
qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]
The output of this python code is
question | key | id | c1 |
---|---|---|---|
Women | 0 | omen | True |
machine | 0 | mac | True |
Could someone please help me out on the same as I am a beginner in Python?
CodePudding user response:
here is my test test (as your question does not contain any)
df.show()
-------- --- ----
|question|key| id|
-------- --- ----
| Women| 0|omen|
| machine| 2| mac|
| foo| 1| bar|
-------- --- ----
and my code to create the expected output :
from pyspark.sql import functions as F
df = df.withColumn("c1", F.col("question").contains(F.col("id")))
df.show()
-------- --- ---- -----
|question|key| id| c1|
-------- --- ---- -----
| Women| 0|omen| true|
| machine| 2| mac| true|
| foo| 1| bar|false|
-------- --- ---- -----
then you can simply filter
on c1:
df.where("c1").show()
-------- --- ---- ----
|question|key| id| c1|
-------- --- ---- ----
| Women| 0|omen|true|
| machine| 2| mac|true|
-------- --- ---- ----