I have a java POJO for collecting metics like below :
public class Metric {
Long metricId;
Long resultKeyId;
@NonNull DatasetType datasetType;
@NonNull String datasetName;
@NonNull String analyzerName;
@NonNull String constraintAlias;
@NonNull LocalDateTime entityDate;
@NonNull long entityDurationSec;
@NonNull Double metricValue;
@NonNull String changedBy;
Long jobId = 0L;
Long codeArtifactId = 0L;
LocalDateTime createdAt;
LocalDateTime lastChanged;
}
I have a list of metrics from the above pojo like List<Metric> metrics
Now this list can have multiple items and i want to select only one record for the same resultKeyId,datasetType,datasetName,analyzerName,constraintAlias
with the max createdAt
The SQL Representation of this would be something like :
select a.* from
dataval_metric a
join dataval_metric b
on a.result_key_id=b.result_key_id
and a.dataset_type=b.dataset_type
and a.dataset_name=b.dataset_name
and a.analyzer_name=b.analyzer_name
and a.constraint_alias=b.constraint_alias
where a.result_key_id = 434
and a.mysql_row_created_at >= b. mysql_row_created_at;
Looking for pointers to understand how this can be done in a performant way in Java
CodePudding user response:
You have to use gropingBy
method using the fields as key.
The key can be:
- a List:
Map<List<Object>, Optional<Metric>> map = metrics.stream()
.collect(Collectors.groupingBy(m ->
List.of(m.getResultKeyId(),
m.getDatasetType(),
m.getDatasetName(),
m.getAnalyzerName(),
m.getConstraintAlias()),
Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));
- an object of type Metric if you override the method
equals
andhashCode
based just on the fields you want:
Map<Metric, Optional<Metric>> map = metrics.stream()
.collect(Collectors.groupingBy(m -> m,
Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));
- another object with
equals
andhascode
overridden like Quintent of the library javatuples
Map<Quintet, Optional<Metric>> map = metrics.stream()
.collect(Collectors.groupingBy(m ->
new Quintet(m.getResultKeyId(),
m.getDatasetType(),
m.getDatasetName(),
m.getAnalyzerName(),
m.getConstraintAlias()),
Collectors.maxBy(Comparator.comparing(Metric::getCreatedAt))));
CodePudding user response:
One of the ways to do this.
So we use Collectors.toMap
, which maps the key represented as a record MetricKey
(basically this is just a tuple of the fields you need to group by) to a Metric
. Since toMap
doesn't allow duplicates, we also provide the merge function which always keeps metric with a maximum createdDate
in the map.
So I would propose to add the getKey
method to the Metric
class so that it returns the key as a record or as a custom class which overrides equals
and hashCode
.
class Metric
{
// ... all your fields
record MetricKey(Long resultKeyId, String analyzerName,
DatasetType datasetType, String datasetName, String constraintAlias) { }
public MetricKey getKey() {
return new MetricKey(resultKeyId, datasetType, datasetName,
analyzerName, constraintAlias);
}
public LocalDateTime getCreatedAt() {
return createdAt;
}
}
And the data processing pipeline:
List<Metric> maximums = new ArrayList<>(metrics.stream().collect(
Collectors.toMap(
Metric::getKey,
Function.identity(),
(m1, m2) -> m1.createdAt > m2.createdAt ? m1 : m2))
.values());