In PySpark, we can create struct data type when using createDataFrame
like in the following example ("b", "c")
and ("e", "f")
df = spark.createDataFrame([
["a", ("b", "c")],
["d", ("e", "f")]
])
df.printSchema()
# root
# |-- _1: string (nullable = true)
# |-- _2: struct (nullable = true)
# | |-- _1: string (nullable = true)
# | |-- _2: string (nullable = true)
df.show()
# --- ------
# | _1| _2|
# --- ------
# | a|{b, c}|
# | d|{e, f}|
# --- ------
Is there a similar way in Scala - to create struct schema inside createDataFrame
, without using org.apache.spark.sql.functions
?
CodePudding user response:
For your specific example, you can use tuples and call this flavor of createDataFrame
.
val df = spark.createDataFrame(Seq(
("a", ("b", "c")),
("d", ("e", "f"))
))
df.printSchema()
/*
root
|-- _1: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- _1: string (nullable = true)
| |-- _2: string (nullable = true)
*/
df.show()
/*
--- ------
| _1| _2|
--- ------
| a|[b, c]|
| d|[e, f]|
--- ------
*/
Instead of ("b", "c")
one can also use "b" -> "c"
to create a tuple of length 2.
Preferred method
Tuples can become difficult to manage when dealing with many fields and especially nested fields. Likely, you'll want to model your data using case class
(s). This also allows to specify struct field names and types.
case class Person(name: String, age: Int)
case class Car(manufacturer: String, model: String, mileage: Double, owner: Person)
val df = spark.createDataFrame(Seq(
Car("Toyota", "Camry", 81400.8, Person("John", 37)),
Car("Honda", "Accord", 152090.2, Person("Jane", 25))
))
df.printSchema()
/*
root
|-- manufacturer: string (nullable = true)
|-- model: string (nullable = true)
|-- mileage: double (nullable = false)
|-- owner: struct (nullable = true)
| |-- name: string (nullable = true)
| |-- age: integer (nullable = false)
*/
df.show()
/*
------------ ------ -------- ----------
|manufacturer| model| mileage| owner|
------------ ------ -------- ----------
| Toyota| Camry| 81400.8|[John, 37]|
| Honda|Accord|152090.2|[Jane, 25]|
------------ ------ -------- ----------
*/