Home > Software design >  Spark: Create Single Column Complex Type Dataframe
Spark: Create Single Column Complex Type Dataframe

Time:10-02

Suppose I have a case class as follows:

final case class Person(name: String, age: Int)

I'd like to create a single column dataframe that has a complex StructType of Person. I want spark to infer the schema.

val data = Seq(Person("Tom", 30), Person("Anna", 35))

val df = spark.createDataFrame(data)

I want spark to infer that the dataframe is a single column with complex type of Person. Currently, it splits Person up into multiple columns

CodePudding user response:

You can either use:

final case class PersonAttributes(name: String, age: Int)
final case class Person(attributes: PersonAttributes)

then:

val data = Seq(
  Person(PersonAttributes("Tom", 30)),
  Person(PersonAttributes("Anna", 35))
)

Or you can create the dataset as you are, then using withColumn with struct to create the complex structure you want:

.withColumn("data", struct(col("name"), col("age")))

I am not aware of any other way to read the Person case class directly into a complex structure. Good luck!

CodePudding user response:

You can map the data to the desired structure.

A helper class:

case class PersonWrapper(person: Person)

Now there are two options:

  1. Mapping the scala sequence before creating the Spark dataframe:
val df = spark.createDataFrame(data.map( PersonWrapper(_)))

or

  1. mapping the Spark dataframe/dataset:
val df = spark.createDataset(data).map(PersonWrapper(_))
  • Related