If I have a list/Seq of columns in Scala like:
val partitionsColumns = "p1,p2"
val partitionsColumnsList = partitionsColumns.split(",").toList
I can easily use it in partitionBy
or groupBy
like
val windowFunction = Window.partitionBy(partitionsColumnsList:_*)
.orderBy(df("some_date").desc)
But if I want to do the same thing in Spark Java API what should I do?
List<String> partitions = new ArrayList<>();
partitions.add("p1");
partitions.add("p2");
WindowSpec windowSpec = Window.partitionBy(.....)
.orderBy(desc("some_date"));
CodePudding user response:
Some IDEs such as IntelliJIdea support Scala and Java and when you copy a Java code in a Scala class, it will convert code gracefully.
However, you can use the following for your desired operation in Java:
WindowSpec windowSpec = Window.partitionBy("p1","p2").orderBy(col("some_date").desc());
If you a list of columns you can send a Seq to partitionBy method:
List<Column> partitions = new ArrayList<>();
partitions.add(col("p1"));
partitions.add(col("p2"));
Seq<Column> seqPartitions = JavaConverters.asScalaIteratorConverter(partitions.iterator()).asScala().toSeq();
WindowSpec windowSpec = Window.partitionBy(seqPartitions).orderBy(col("some_date").desc());
CodePudding user response:
partitionBy
has two signatures:
partitionBy(Seq<Column> cols)
partitionBy(String colName, Seq<String> colNames)
So you may choose between one of the two. Let's say that partitions
is a list of String. It would go like this:
import scala.collection.JavaConversions;
import scala.collection.Seq;
List<Column> columns = partitions.stream()
.map(functions::col)
.collect(Collectors.toList());
Seq<Column> columnSeq = JavaConversions.asScalaBuffer(columns).toSeq();
WindowSpec windowSpec = Window.partitionBy(columnSeq)
// OR
Seq<String> columnSeq2 = JavaConversions.asScalaBuffer(partitions).toSeq();
WindowSpec windowSpec = Window
.partitionBy(partitions.get(0), columnSeq2.tail().toSeq());