I have a json
file with about 1,200,000 records.
I want to read this file with pyspark
as :
spark.read.option("multiline","true").json('file.json')
But it causes this error:
AnalysisException: Unable to infer schema for JSON. It must be specified manually.
When I create a json
file with a smaller record count in the main file, this code can read the file.
I can read this json
file with pandas
, when I set the encoding
to utf-8-sig
:
pd.read_json("file.json", encoding = 'utf-8-sig')
How can I solve this problem?
CodePudding user response:
Try this out:
spark.read.option("multiline","true").option("inferSchema", "true").json('file.json')
CodePudding user response:
Since adding the encoding helps, maybe the following is what you need:
spark.read.json("file.json", multiLine=True, encoding="utf8")