Home > Software design >  How to do groupBy and filter based on maxDate record in Java List
How to do groupBy and filter based on maxDate record in Java List

Time:03-04

I have a java POJO for collecting metics like below :

public class Metric {

    Long metricId;
    Long resultKeyId;
    @NonNull DatasetType datasetType;
    @NonNull String datasetName;
    @NonNull String analyzerName;
    @NonNull String constraintAlias;
    @NonNull LocalDateTime entityDate;
    @NonNull long entityDurationSec;
    @NonNull Double metricValue;
    @NonNull String changedBy;
    Long jobId = 0L;
    Long codeArtifactId = 0L;
    LocalDateTime createdAt;
    LocalDateTime lastChanged;

}

I have a list of metrics from the above pojo like List<Metric> metrics

Now this list can have multiple items and i want to select only one record for the same resultKeyId,datasetType,datasetName,analyzerName,constraintAlias with the max createdAt

The SQL Representation of this would be something like :

select a.* from 
dataval_metric a 
join dataval_metric b 
on a.result_key_id=b.result_key_id 
and a.dataset_type=b.dataset_type 
and a.dataset_name=b.dataset_name 
and a.analyzer_name=b.analyzer_name 
and a.constraint_alias=b.constraint_alias  
where a.result_key_id = 434 
and a.mysql_row_created_at >= b. mysql_row_created_at;

Looking for pointers to understand how this can be done in a performant way in Java

CodePudding user response:

You have to use gropingBy method using the fields as key.

The key can be:

  1. a List:
Map<List<Object>, Optional<Metric>> map = metrics.stream()
        .collect(Collectors.groupingBy(m ->
                        List.of(m.getResultKeyId(),
                                m.getDatasetType(),
                                m.getDatasetName(),
                                m.getAnalyzerName(),
                                m.getConstraintAlias()),
                Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));
  1. an object of type Metric if you override the method equals and hashCode based just on the fields you want:
Map<Metric, Optional<Metric>> map = metrics.stream()
        .collect(Collectors.groupingBy(m -> m,
                Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));
  1. another object with equals and hascode overridden like Quintent of the library javatuples
Map<Quintet, Optional<Metric>> map = metrics.stream()
        .collect(Collectors.groupingBy(m ->
                        new Quintet(m.getResultKeyId(),
                                m.getDatasetType(),
                                m.getDatasetName(),
                                m.getAnalyzerName(),
                                m.getConstraintAlias()),
                Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));

CodePudding user response:

One of the ways to do this.

So we use Collectors.toMap, which maps the key represented as a record MetricKey (basically this is just a tuple of the fields you need to group by) to a Metric. Since toMap doesn't allow duplicates, we also provide the merge function which always keeps metric with a maximum createdDate in the map.

So I would propose to add the getKey method to the Metric class so that it returns the key as a record or as a custom class which overrides equals and hashCode.

class Metric
{
  // ... all your fields
   
  record MetricKey(Long resultKeyId, String analyzerName,
       DatasetType datasetType, String datasetName, String constraintAlias) {  }
   
  public MetricKey getKey() {
    return new MetricKey(resultKeyId, datasetType, datasetName,
       analyzerName, constraintAlias);
  }

  public LocalDateTime getCreatedAt() {
    return createdAt;
  }
}

And the data processing pipeline:

List<Metric> maximums = new ArrayList<>(metrics.stream().collect(
  Collectors.toMap(
    Metric::getKey,
    Function.identity(),
    (m1, m2) -> m1.createdAt > m2.createdAt ? m1 : m2))
  .values());
  • Related