I have JavaPairRDD
as
JavaPairRDD<String, Iterable<Row>> rdd = mydataset.orderBy("orderfield1", "orderfield2").javaRDD().mapToPair(row -> new Tuple2<>(row.getAs("id").toString(), row)).groupByKey()
As groupbykey()
doesn't maintain order orderby
doesn't work here.
I want to order the Iterable<Row>
using some of the fields from dataset.
CodePudding user response:
You could transform the Iterable
into a List
and then sort that list like below. I assume that your sorting field is called x
and that it is of type String but you can obviously adapt that to your specific case.
String sortingField = "x"
JavaPairRDD<String, List<Row>> rdd = mydataset
.javaRDD()
.mapToPair(row -> new Tuple2<>(row.getAs("id").toString(), row))
.groupByKey()
.mapValues(it -> {
List<Row> rows = new ArrayList<>();
it.forEach(rows::add);
rows.sort(
(Row a, Row b) -> a.<String>getAs(sortingField).compareTo(b.<String>getAs(sortingField))
);
return rows;
});
Note that this is much simpler to write in scala:
val rdd = mydataset
.rdd
.map(row => (row.getAs("id").toString, row))
.groupByKey
.mapValues( _.toSeq.sortBy(_.getAs[String]("x")))