Home > Net >  How do I deduplicate a list based on hourly intervals in Java?
How do I deduplicate a list based on hourly intervals in Java?

Time:07-07

First of all, I have this object that I call MyObject;

public class MyObject{
    private google.protobuf.Timestamp timestamp;
    private String description;
}

Then I have this list:

List<MyObject> myList = new ArrayList<>();

Now let's imagine that myList contains 500 items. What I want, is to eliminate duplicates (identical descriptions) that occur within the same hour.

So two different items with identical descriptions should not both exist in the list within the same hour. If they do, we want to only keep one and delete the other.

Example:

If the list contains the following two items:

06-07-2022T01:30:00, "some random description" and 06-07-2022T01:35:00, "some random description"

Then we want to delete one of them because they have identical description and are within the same hour.

But if we have this:

06-07-2022T01:30:00, "some random description" and 06-07-2022T03:20:00, "some random description"

Then we don't want to delete any of them as they are not within the same hour.

How do I do that?

CodePudding user response:

Based on the clarifications you've given in the comments I've used a LocalDateTime to simplify the sample entry and retrieve the hour, but I'm sure that google.protobuf.Timestamp can be converted to a proper date and extract its hour.

To keep only one object according to description, date and hour, I've added a helper method to your POJO to get a concatenation of these fields and then group by their result value in order to get a Map where to each key (description, date and hour) there is only one object associated. Lastly, I've collected the Map's values into a List.

List<MyObject> list = new ArrayList<>(List.of(
        new MyObject(LocalDateTime.parse("06-07-2022T01:30:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description"),
        new MyObject(LocalDateTime.parse("06-07-2022T01:35:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description"),
        new MyObject(LocalDateTime.parse("06-07-2022T03:20:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description"),
        new MyObject(LocalDateTime.parse("06-07-2022T04:30:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description2"),
        new MyObject(LocalDateTime.parse("06-07-2022T04:35:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description2"),
        new MyObject(LocalDateTime.parse("06-07-2022T06:20:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description2"),
        new MyObject(LocalDateTime.parse("08-07-2022T01:30:00", DateTimeFormatter.ofPattern("dd-MM-yyyy'T'HH:mm:ss")), "some random description")
));

List<MyObject> listRes = list.stream()
        .collect(Collectors.toMap(
                obj -> obj.getDescrDateHour(),
                Function.identity(),
                (obj1, obj2) -> obj1
        ))
        .values()
        .stream().
        collect(Collectors.toList());

POJO Class

class MyObject {
    private LocalDateTime timestamp;
    private String description;

    public MyObject(LocalDateTime timestamp, String description) {
        this.timestamp = timestamp;
        this.description = description;
    }

    public LocalDateTime getTimestamp() {
        return timestamp;
    }

    public String getDescription() {
        return description;
    }

    public String getDescrDateHour() {
        return description   timestamp.toLocalDate().toString()   timestamp.getHour();
    }

    @Override
    public String toString() {
        return timestamp   " - "   description;
    }
}

Here is a link to test the code

https://www.jdoodle.com/iembed/v0/sZV

Output

Input: 
2022-07-06T01:30 - some random description
2022-07-06T01:35 - some random description
2022-07-06T03:20 - some random description
2022-07-06T04:30 - some random description2
2022-07-06T04:35 - some random description2
2022-07-06T06:20 - some random description2
2022-07-08T01:30 - some random description

Output: 
2022-07-06T04:30 - some random description2
2022-07-08T01:30 - some random description
2022-07-06T06:20 - some random description2
2022-07-06T03:20 - some random description
2022-07-06T01:30 - some random description

CodePudding user response:

A quiet simple solution would be a HashMap. You use description as key and timestamp as value. So you always save only the last timestamp to given description and overwrite it automaticly.

If you want to hold your Object I would just sort the list by date, then fill in a HashMap and transform the HashMap to List again. It has not the best Performance, but its easy. You can Sort by Date with functional Java sorting a Collection in functional style

CodePudding user response:

You could define an equality calculating class (or do it in the MyObject class, depending on what it actually represents) and use it to find unique values based on the equality definition. In this case equality would mean: same description and same timestamp with hourly precision.

Here's an example (might need some tweaking, just a concept presentation):


class UniqueDescriptionWithinHourIdentifier {

    // equals and hashCode could also be implemented in MyObject
    // if it's only purpose is data representation
    // but a separate class defines a more concrete abstraction

    private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyyMMddHH");

    private Date timestamp;
    private String description;

    UniqueDescriptionWithinHourIdentifier(MyObject object) {
        timestamp = object.timestamp;
        description = object.description;
    }

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null || getClass() != object.getClass()) {
            return false;
        }
        var other = (UniqueDescriptionWithinHourIdentifier) object;
        return description.equals(other.description)
               // compare the timestamps however you want - format used for simplicity
               && DATE_FORMAT.format(timestamp)
                             .equals(DATE_FORMAT.format(other.timestamp));
    }

    @Override
    public int hashCode() {
        // cannot contain timestamp - a single hash bucket will contain multiple elements 
        // with the same definition and the equals method will filter them out
        return Objects.hashCode(description);
    }
}

class MyObjectService {

    // here a new list without duplicates is calculated  

    List<MyObject> withoutDuplicates(List<MyObject> objects) {
        return List.copyOf(objects.stream()
                                  .collect(toMap(UniqueDescriptionWithinHourIdentifier::new,
                                                 identity(),
                                                 (e1, e2) -> e1,
                                                 LinkedHashMap::new))
                                  .values());
    }

}

CodePudding user response:

Add equals & hashcode method to your MyObject class with equals has some logic like below:

@Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyObject other = (MyObject) obj;
                
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(timestamp);
        int hour1=calendar.HOUR;
        int date1 = calendar.DATE;
        calendar.setTime(other.timestamp);
        int hour2 = calendar.HOUR;
        int date2  =calendar.DATE;
        return Objects.equals(hour1, hour2) && Objects.equals(date1, date2);
    }

Here, basically I am checking if 2 objects has same hour & date & if so, just ignore another object.

Once you do that, you can just use :

List<MyObject> myList = new ArrayList<>();
myList.stream().distinct().collect(Collectors.toList()); // returns you new distinct objects list.

Please note, you can use default implementation of hashCode generated via your editor for this case. distinct() method of stream is checking if you have equals & hashcode available for underlying streams class.

Note: you can extend equals to check day , date, month, year etc. for verifying exact date.

  • Related