Home > other >  Spark Java to maximize the value problem, a great god help
Spark Java to maximize the value problem, a great god help

Time:09-16

Business scenarios for, statistical table on a recent visit, follow-up and amount is less than the total of 7,
Follow-up table phs_visit primary key is ID corresponding to multiple access records, need to statistics the biggest access time take ID visitDate, FBS less than 7 data, statistics the total number of ID,

Online ordering method sortByKey (), can't realize the Value of ordering, please have a look great god can work it out?

The HTML code is as follows: ( how to realize the visitTime sort, take the biggest visitTime?)

JavaSparkContext CTX=new JavaSparkContext (new SparkConf () setAppName (" visitReport "));

//HBaseUtils readFromHBase method for packaging good, read data from the Hbase
JavaPairRDD VisitHbaseRDD=HBaseUtils. ReadFromHBase (CTX, "phs_visit");

JavaPairRDD VisitRdd=visitHbaseRDD. MapToPair (tuple - & gt; {
Result the rs=tuple. _2 ();
String id="";//number
String FBS="";//amount
String visitDate="";//the follow-up time

If (rs!=null & amp; & Rs. GetRow ()!=null) {

If (rs. GetValue (" CF ". GetBytes (), "ID". GetBytes ())!=null) {
Id=Bytes. ToString (rs. GetValue (" CF ". GetBytes (), "id". GetBytes ())). The trim ();
}

If (rs. GetValue (" CF ". GetBytes (), "FBS." getBytes ())!=null) {
FBS=Bytes. ToString (rs. GetValue (" CF ". GetBytes (), "FBS." getBytes ())). The trim ();
}

If (rs. GetValue (" CF ". GetBytes (), "VISITDATE" getBytes ())!=null) {
VisitDate=Bytes. ToString (rs. GetValue (" CF ". GetBytes (), "visitDate" getBytes ())). The trim ();
}
}

Long visitTime=0 l;
if(! StringUtils. IsBlank (visitDate)) {
SimpleDateFormat dateFormat=new SimpleDateFormat (" yyyy/MM/dd HH: MM: ss ");
The Date inputDate=dateFormat. Parse (visitDate);
VisitTime=inputDate. GetTime ();
}
Return new Tuple2 & lt;> (id, new Tuple2 & lt;> (FBS, visitTime));
});

CodePudding user response:

Hbase table store multiple version of data: phs_visit primary key is ID corresponding to multiple access records, but access to the default access is the latest one, which is the same rowkey, directly get you receive is the one time the largest (latest), and then use the filter, filter column values directly FBS less than 7,
  • Related