Home > Software design >  Specifying column with multiple datatypes in Spark Schema
Specifying column with multiple datatypes in Spark Schema

Time:04-21

I am trying to create schema to parse json into spark dataframe

I have column value in json which could be either struct or string

"value": {
    "entity-type": "item",
    "id": "someid",
    "numeric-id": 30
  }

"value": "SomePicture.jpg",

How can i specify that in the schema

CodePudding user response:

{
  "type": ["object", "string"],
  "properties": { ... }
}

https://json-schema.org/understanding-json-schema/index.html

CodePudding user response:

Solved it using below approach

In json we can do the way you specified above. But while defining spark schema it doesnt work So for Spark schema I had to fetch value in string and then determine if value is going to be of structtype, based on certain conditions and then use from_json(value, new StructType()) to convert string back to json

  • Related