This is mainly a question intended for me to learn about various performant ways of filtering and assigning objects to Lists.
Assume
public class A implements Comparable<A> {
private String id;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
@Override
public int compareTo(A o) {
return o.getId().compareTo(this.getId());
}
}
public class B implements Comparable<B>{
private String id;
private List<A> aList = new ArrayList<>();
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public void addA(A a)
{
aList.add(a);
}
@Override
public int compareTo(B o) {
return o.getId().compareTo(this.getId());
}
}
public class Main {
public static void main(String[] args) {
SortedSet<A> aSet = new TreeSet<>();
SortedSet<B> bSet = new TreeSet<>();
for(int i=0;i<100000;i )
{
UUID uuid = UUID.randomUUID();
String uuidAsString = uuid.toString();
A a1 = new A();
a1.setId(uuidAsString);
aSet.add(a1);
A a2 = new A();
a2.setId(uuidAsString);
aSet.add(a2);
B b = new B();
b.setId(uuidAsString);
bSet.add(b);
}
//this is where the performance part comes in
//scenario: For each B I want to find A whose Id matches B's Id, and assign it to B
//assume B can have 1-5 instances of A (even though for this example I only initialized 2)
bSet.parallelStream().forEach(b -> {
aSet.parallelStream().filter(a -> {
return b.getId().equals(a.getId());
}).forEach(a -> {
b.addA(a);
});
});
}
}
The solution I came up with was to combine parallelstreams and filters to find the matching IDs between the two types of objects and then to loops through the filtered results to add the instances of A to B.
I used TreeSets here because I thought the ordered IDs might help speed things up, same reason I used parallelStreams.
This is mostly abstracted out from a scenario from a project I am doing at the office which I cant post here. The classes in the actual project have a lot more variables, and in the worst case - have sublists of lists (I resolved that using flatMaps in streams).
However my inexperienced gut tells me there is a more performant way to solve this problem.
I am primarily looking for practical ways to speed this up.
Some ways I thought of speeding this up:
- Switch the lists and sets to Eclipse Collections
- Assuming the starting point of these classes are CSV files -> Maybe write an apache spark application that will map these(I assumed that Spark could have some internal clever way of doing this faster than Streams).
- I dunno......write them all to sql tables....map them via foreign keys and then query them again?
Speed is the name of the game, solutions using vanilla java, different librarys (like Eclipse Collections), or entire engines like Spark are acceptable
Assume the minimum list size is atleast 50,000
Bonus complexity: You can add another class 'C', with multiple instances of 'B' in it. My inexperienced self can only think of writing another similar streaming operation as A->B and run it after the first stream is done. Is there a way to combine both A->B and B->C operations together so that they happen at once. That will definitely speed things up.
Sorry about my inexperienced self and sorry again if this is a duplicate too
CodePudding user response:
In your code, you use b.addA(a);
where b
is an instance of B
while B
doesn't have a method addA(A)
. Is B
supposed to keep a list of A
's?
However, the answer to your question is hashing. You are looking for a multimap, to be specific. As a quick fix you can use a TreeMap
that stores a List of A
's by their id:
public static void main(String[] args) {
TreeMap<String, ArrayList<A>> aSet = new TreeMap<>();
ArrayList<B> bSet = new ArrayList<>();
for (int i = 0; i < 100000; i ) {
UUID uuid = UUID.randomUUID();
String uuidAsString = uuid.toString();
A a1 = new A();
a1.setId(uuidAsString);
ArrayList<A> list = aSet.get(a1.getId());
if (list == null) {
list = new ArrayList<>();
aSet.put(a1.getId(), list);
}
list.add(a1);
A a2 = new A();
a2.setId(uuidAsString);
list = aSet.get(a2.getId());
if (list == null) {
list = new ArrayList<>();
aSet.put(a2.getId(), list);
}
list.add(a2);
B b = new B();
b.setId(uuidAsString);
bSet.add(b);
}
for (B b : bSet) {
System.out.println(aSet.get(b.getId()));
}
}
Please note that this isn't a good implementation and instead you should write your own multimap or use the one in guava