The following code:
val data1 = Seq(("Android", 1, "2021-07-24 12:01:19.000", "play"), ("Android", 1, "2021-07-24 12:02:19.000", "stop"),
("Apple", 1, "2021-07-24 12:03:19.000", "play"), ("Apple", 1, "2021-07-24 12:04:19.000", "stop"))
val schema1 = StructType(Array(StructField("device_id", StringType, true),
StructField("video_id", IntegerType, true),
StructField("event_timestamp", StringType, true),
StructField("event_type", StringType, true)
))
val spark = SparkSession.builder()
.enableHiveSupport()
.appName("PlayStop")
.getOrCreate()
var transaction=spark.createDataFrame(data1, schema1)
produces the error:
Cannot resolve overloaded method 'createDataFrame'
Why?
And how to fix it?
CodePudding user response:
If your schema consists of default StructField
settings, the easiest way to create a DataFrame would be to simply apply toDF()
:
val transaction = data1.toDF("device_id", "video_id", "event_timestamp", "event_type")
To specify custom schema definition, note that createDataFrame()
takes a RDD[Row]
and schema as its parameters. In your case, you could transform data1 into a RDD[Row]
like below:
val transaction = spark.createDataFrame(sc.parallelize(data1.map(Row(_))), schema1)
An alternative is to use toDF
, followed by rdd
which represents a DataFrame (i.e. Dataset[Row]
) as RDD[Row]
:
val transaction = spark.createDataFrame(data1.toDF.rdd, schema1)