"Unable to infer schema for JSON." error in PySpark?-CodePudding

I have a json file with about 1,200,000 records. I want to read this file with pyspark as :

spark.read.option("multiline","true").json('file.json')

But it causes this error:

AnalysisException: Unable to infer schema for JSON. It must be specified manually.

When I create a json file with a smaller record count in the main file, this code can read the file.

I can read this json file with pandas, when I set the encoding to utf-8-sig:

pd.read_json("file.json", encoding = 'utf-8-sig')

How can I solve this problem?

CodePudding user response：

Try this out:

spark.read.option("multiline","true").option("inferSchema", "true").json('file.json')

CodePudding user response：

Since adding the encoding helps, maybe the following is what you need:

spark.read.json("file.json", multiLine=True, encoding="utf8")