Suppose I have a case class as follows:
final case class Person(name: String, age: Int)
I'd like to create a single column dataframe that has a complex StructType of Person
. I want spark to infer the schema.
val data = Seq(Person("Tom", 30), Person("Anna", 35))
val df = spark.createDataFrame(data)
I want spark to infer that the dataframe is a single column with complex type of Person
. Currently, it splits Person
up into multiple columns
CodePudding user response:
You can either use:
final case class PersonAttributes(name: String, age: Int)
final case class Person(attributes: PersonAttributes)
then:
val data = Seq(
Person(PersonAttributes("Tom", 30)),
Person(PersonAttributes("Anna", 35))
)
Or you can create the dataset as you are, then using withColumn
with struct
to create the complex structure you want:
.withColumn("data", struct(col("name"), col("age")))
I am not aware of any other way to read the Person
case class directly into a complex structure. Good luck!
CodePudding user response:
You can map the data to the desired structure.
A helper class:
case class PersonWrapper(person: Person)
Now there are two options:
- Mapping the scala sequence before creating the Spark dataframe:
val df = spark.createDataFrame(data.map( PersonWrapper(_)))
or
- mapping the Spark dataframe/dataset:
val df = spark.createDataset(data).map(PersonWrapper(_))