In my implemented code I get the following error:
error: not found: value transform
.withColumn("min_date", array_min(transform('min_date,
^
I have been unable to resolve this. I already have the following import statements:
import sqlContext.implicits._
import org.apache.spark.sql.functions.split
import org.apache.spark.sql.functions._
I'm using Apache Zeppelin to execute this.
Here is the full code for reference and the sample of the dataset I'm using:
1004,bb5469c5|2021-09-19 01:25:30,4f0d-bb6f-43cf552b9bc6|2021-09-25 05:12:32,1954f0f|2021-09-19 01:27:45,4395766ae|2021-09-19 01:29:13,
1018,36ba7a7|2021-09-19 01:33:00,
1020,23fe40-4796-ad3d-6d5499b|2021-09-19 01:38:59,77a90a1c97b|2021-09-19 01:34:53,
1022,3623fe40|2021-09-19 01:33:00,
1028,6c77d26c-6fb86|2021-09-19 01:50:50,f0ac93b3df|2021-09-19 01:51:11,
1032,ac55-4be82f28d|2021-09-19 01:54:20,82229689e9da|2021-09-23 01:19:47,
val users = sc.textFile("path to file").map(x=>x.replaceAll("\\(","")).map(x=>x.replaceAll("\\)","")).map(x=>x.replaceFirst(",","*")).toDF("column")
val tempDF = users.withColumn("_tmp", split($"column", "\\*")).select(
$"_tmp".getItem(0).as("col1"),
$"_tmp".getItem(1).as("col2")
)
val output = tempDF.withColumn("min_date", split('col2 , ","))
.withColumn("min_date", array_min(transform('min_date,
c => to_timestamp(regexp_extract(c, "\\|(.*)$", 1)))))
.show(10,false)
CodePudding user response:
There is no method in functions
(version 3.1.2) with the signature transform(c: Column, fn: Column => Column)
so you're writing importing the wrong object or trying to do something else.
CodePudding user response:
You are probably using a version of Spark < Spark 3.x, and this Scala dataframe API transform
does not work. With Spark 3.x your code works fine.
I could not get with 2.4 that to work I noted. Not enough time, but have a look here: Higher Order functions in Spark SQL