How to write schema for below json :
"place_results": {
"title": "W2A Architects",
"place_id": "ChIJ4SUGuHw5xIkRAl0856nZrBM",
"data_id": "0x89c4397cb80625e1:0x13acd9a9e73c5d02",
"data_cid": "1417747306467056898",
"reviews_link": "httpshl=en",
"photos_link": "https=en",
"gps_coordinates": {
"latitude": 40.6027801,
"longitude": -75.4701499
},
"place_id_search": "http",
"rating": 3.7,
I am getting nulls while writing below schema. How to know the correct datatype to use?
StructField('place_results', StructType([
StructField('address', StringType(), True),
StructField('data_cid', StringType(), True),
StructField('data_id', StringType(), True),
StructField('gps_coordinates', StringType(), True),
StructField('open_state', StringType(), True),
StructField('phone', StringType(), True),
StructField('website', StringType(), True)
])),
CodePudding user response:
This should work:
StructType([
StructField('place_results',
StructType([
StructField('data_cid', StringType(), True),
StructField('data_id', StringType(), True),
StructField('gps_coordinates', StructType([
StructField('latitude', DoubleType(), True),
StructField('longitude', DoubleType(), True)]), True),
StructField('photos_link', StringType(), True),
StructField('place_id', StringType(), True),
StructField('place_id_search', StringType(), True),
StructField('rating', DoubleType(), True),
StructField('reviews_link', StringType(), True),
StructField('title', StringType(), True)]), True)
])
I got this using this command:
spark.read.option("multiLine", True).json("dbfs:/test/sample.json").schema