I'm trying to manually create a dataset with a type Set column:
case class Files(Record: String, ids: Set)
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
This gives me the error:
>command-1888379816641405:10: error: type Set takes type parameters
case class Files(s3path: String, ids: Set)
What am I doing wrong?
CodePudding user response:
Set
is a parametrized type, so when you declare it in your Files
case class, you should define what type is inside your Set
, like Set[Int]
for a set of integers. So your Files
case class definition should be:
case class Files(Record: String, ids: Set[Int])
And so the complete code to create a dataset with a set column:
import org.apache.spark.sql.SparkSession
object ToDataset {
private val spark = SparkSession.builder()
.master("local[*]")
.appName("test-app")
.config("spark.ui.enabled", "false")
.config("spark.driver.host", "localhost")
.getOrCreate()
def main(args: Array[String]): Unit = {
import spark.implicits._
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
}
case class Files(Record: String, ids: Set[Int])
}
that will return the following dataset:
------------ ----------------
| Record| ids|
------------ ----------------
|202110260931|[770010, 770880]|
|202110260640|[770010, 770880]|
|202110260715|[770010, 770880]|
------------ ----------------