Home > Enterprise >  If two datasets have some overlapping identical records, how can I exclude from second dataset? Equi
If two datasets have some overlapping identical records, how can I exclude from second dataset? Equi

Time:07-21

If I have two datasets as follows:

Dataset 1
| Name | Age |
| ---- | --- |
| John | 32  |
| Vic  | 29  |
| Mary | 28  |
| Rea  | 29  |


Dataset 2 : 

| Name | Age |
| ---- | --- |
| John | 32  |
| Joe  | 37  |
| Mary | 28  |
| Bo   | 35  |


I want to have dataset 2 look at dataset 1, and exclude the duplicate records of John and Mary.

So dataset 2 should only include Joe and Bo.

SQL has EXCEPT, MINUS or NOT EXISTS. Does Java have an equivalent method?

CodePudding user response:

Most likely you can use streams:

dataset2.stream().filter(o -> !dataset1.contains(o));

CodePudding user response:

There are a number of options available to you, of course depending on the starting point. Here's a way to solve this if your starting point is java.util.Map containing Strings (name) and Integers (age).

To start, here is the creation of two maps, as well as populating them with data:

Map<String, Integer> data1 = new HashMap<>();
data1.put("John", 32);
data1.put("Vic", 29);
data1.put("Mary", 28);
data1.put("Rea", 29);

Map<String, Integer> data2 = new HashMap<>();
data2.put("John", 32);
data2.put("Joe", 37);
data2.put("Mary", 28);
data2.put("Bo", 35);

This line gets all of the keys ("John", "Vic", etc.) from one set, and calls removeAll() to remove those keys altogether if they're present in the other set. So "John" and "Mary" are removed from the keys, and because it's a map, when the key goes, the value goes too. So if "John,32" is present in both sets, removing the "John" key will remove the "32" as well.

data1.keySet().removeAll(data2.keySet());

Here's the same line, but with a println() before and after, and the output from running it:

System.out.println("data1.keySet(): "   data1);
data1.keySet().removeAll(data2.keySet());
System.out.println("data1.keySet(): "   data1);

data1.keySet(): {Vic=29, John=32, Mary=28, Rea=29}
data1.keySet(): {Vic=29, Rea=29}
  • Related