Pyspark: mismatched input ... expecting EOF-CodePudding

I want to add a column to a data frame and depending on whether a certain value appears in the source json, the value of the column should be the value from the source or null. My code looks like this:

withColumn("STATUS_BIT", expr("case when 'statusBit:' in jsonDF.schema.simpleString() then statusBit else None end"))

When I run this, I am getting "mismatched input ''statusBit:'' expecting {< EOF >, '-'} . Am I doing something wrong with the quotation marks? When I try

withColumn("STATUS_BIT", expr("case when \'statusBit:\' in jsonDF.schema.simpleString() then statusBit else None end"))

I get the exact same error. Trying the whole thing without expr but as a simple when, triggers the error "condition should be a Column". Running 'statusBit:' in jsonDF.schema.simpleString() by itself returns True with the testdata I am using, but somehow I cant integrate it into the dataframe transformation.Thanks a lot for your help in advance.

edit: Applying the solution provided by PLTC has helped a lot, but I am still struggling to get this solution implemented in the when clause: I try

.withColumn("STATUS_BIT", when(lit(df.schema.simplestring()).contains("statusBit") is True, col(statusBit)).otherwise(None))

but it tells me "condition should be a Column". So I added an extra colum called SCHEMA, which is equal to lit(df.schema.simpleString) and I used this column in the condition:

.withColumn("STATUS_BIT", when(col("SCHEMA").contains("statusBit"), col("StatusBit")).otherwise(None)

The problem is that if I run this with test data that does not contain "statusBit", I get the error "No such struct field statusBit in ...", which is obviously the opposite of what I wanted to achieve

CodePudding user response：

jsonDF.schema.simpleString() is Python variable, you can use it in Python way

from pyspark.sql import functions as F

df.withColumn("STATUS_BIT", F.lit(df.schema.simpleString()).contains('statusBit:'))