Home > front end >  Cannot resolve overloaded method 'createDataFrame'
Cannot resolve overloaded method 'createDataFrame'

Time:06-25

The following code:

val data1 = Seq(("Android", 1, "2021-07-24 12:01:19.000", "play"), ("Android", 1, "2021-07-24 12:02:19.000", "stop"),
  ("Apple", 1, "2021-07-24 12:03:19.000", "play"), ("Apple", 1, "2021-07-24 12:04:19.000", "stop"))

val schema1 = StructType(Array(StructField("device_id", StringType, true),
  StructField("video_id", IntegerType, true),
  StructField("event_timestamp", StringType, true),
  StructField("event_type", StringType, true)
))

val spark = SparkSession.builder()
  .enableHiveSupport()
  .appName("PlayStop")
  .getOrCreate()

var transaction=spark.createDataFrame(data1, schema1)

produces the error:

Cannot resolve overloaded method 'createDataFrame'

Why?

And how to fix it?

CodePudding user response:

If your schema consists of default StructField settings, the easiest way to create a DataFrame would be to simply apply toDF():

val transaction = data1.toDF("device_id", "video_id", "event_timestamp", "event_type")

To specify custom schema definition, note that createDataFrame() takes a RDD[Row] and schema as its parameters. In your case, you could transform data1 into a RDD[Row] like below:

val transaction = spark.createDataFrame(sc.parallelize(data1.map(Row(_))), schema1)

An alternative is to use toDF, followed by rdd which represents a DataFrame (i.e. Dataset[Row]) as RDD[Row]:

val transaction = spark.createDataFrame(data1.toDF.rdd, schema1)
  • Related