I have a .csv file that has 1000 lines of data in it, and I'm trying to write a line of code that will show only the last 5 lines of data.
private SparkSession spark;
private JavaSparkContext sc;
private JavaRDD<String> lines;
private JavaRDD<PurchaseOrder> orders;
public OrderProcessingRDDSparkApp(String ...args) throws IOException {
spark = SparkSession.builder().appName("OrderProcessingSparkApp").config("spark.master", "local[1]").getOrCreate();
sc = new JavaSparkContext(spark.sparkContext());
sc.setLogLevel("ERROR");
lines = sc.textFile(args[0]);
orders = lines.map( line -> new PurchaseOrder(line));
What can I try to resolve this?
CodePudding user response:
import scala.util.Random
val sorted = rdd.sortBy(_.apply(3).toInt) //sort asc or desc...
sorted.take(5) //get last 5...
You can have this approach using Java.
Possibly a [duplicate]