Home > Blockchain >  Get the last 5 lines of a 1000 line csv, RDD Spark Java
Get the last 5 lines of a 1000 line csv, RDD Spark Java


I have a .csv file that has 1000 lines of data in it, and I'm trying to write a line of code that will show only the last 5 lines of data.

private SparkSession spark;
    private JavaSparkContext sc;
    private JavaRDD<String> lines;
    private JavaRDD<PurchaseOrder> orders;

public OrderProcessingRDDSparkApp(String ...args) throws IOException {
        spark = SparkSession.builder().appName("OrderProcessingSparkApp").config("spark.master", "local[1]").getOrCreate();
        sc = new JavaSparkContext(spark.sparkContext());
        lines = sc.textFile(args[0]);
        orders = lines.map( line -> new PurchaseOrder(line));

What can I try to resolve this?

CodePudding user response:

    import scala.util.Random    
    val sorted = rdd.sortBy(_.apply(3).toInt) //sort asc or desc...
    sorted.take(5) //get last 5...

You can have this approach using Java.

Possibly a [duplicate]

  • Related