I have a schema structure as below:
StructField('results', ArrayType(MapType(StringType(), StringType()), True), True),
StructField('search_information', MapType(StringType(), StringType()), True),
StructField('metadata', MapType(StringType(), StringType()), True),
StructField('parameters', MapType(StringType(), StringType()), True),
StructField('results_2', MapType(StringType(), StringType()), True),
And I have above columns in a file and each file may or may not have the these columns and I am reading JSON file as
spark.read.JSON.option(schema=schema, path=path)
I need to check for some column existing and make necessary transformations. I am checking for column existence as
if "metadata:" in df.schema.simpleString():
The above is always returning "True" as I have defined schema. How to check file raw data for column existence?
CodePudding user response:
You can read the file without specifying the schema:
df = spark.read.option('multiline', 'true').json('file_name.json')
Then, if you want to check for column existance, you can use one of the following:
if 'metadata' in df.columns:
if 'metadata' in df.schema.names:
Another way is to use Python tools to check for existence of keys inside JSON:
import json
j = json.loads(the_file)
if "metadata" in j: