I am working on a sharding problem.
- Imagine I have 10 lists.
- Each list has a series of items that are independently sorted.
- I want to get the Nth item as if all the lists were sorted together in one large list.
Do I need to sort the lists overall to get an item at a particular index?
I solved a similar but not equivalent problem where there is:
- 10 lists
- Each list represents a range of items that are after the previous list.
here's the code to iterate through all the indexes of the lists:
/* code to iterate through all items in order
* threads refers to one of the lists */
int sizes[] = new int[threads.size()];
for (int i = 0 ; i < threads.size(); i ) {
sizes[i] = threads.get(i).data2.size();
}
int n = 0;
int thread = 0;
int size = threads.size();
int offset = 0;
long iterationStart = System.nanoTime();
while (thread < size) {
// System.out.println(String.format("%d %d", thread, offset threads.get(thread).data.get(n)));
int current = offset threads.get(thread).data.get(n);
n = n 1;
if (n == sizes[thread]) {
offset = sizes[thread];
thread ;
n = 0;
}
}
long iterationEnd = System.nanoTime();
long iterationTime = iterationEnd - iterationStart;
Here's the code to lookup an item by index.
int lookupKey = 329131;
int current = lookupKey;
int currentThread = 0;
int total = 0;
while (current >= 0 && currentThread <= size - 1) {
int next = current - sizes[currentThread];
if (next >= 0) {
total = sizes[currentThread];
current -= sizes[currentThread];
currentThread ;
} else {
break;
}
}
long lookupEnd = System.nanoTime();
long lookupTime = lookupEnd - lookupStart;
System.out.println(String.format("%d %d",
currentThread,
total threads.get(currentThread).data.get(current)));
I'm hoping there's some property of sorted collections that I can use to retrieve the Nth item in an overall sorted lists.
What I have in effect is multiple partial orders.
I have some other code that does a N way merge between multiple sorted lists. Is the fastest option to run this in a loop up to lookupIndex?
int size1 = threads.size();
int[] positions = new int[size1];
Arrays.fill(positions, 0);
PriorityQueue<Tuple> pq = new PriorityQueue<>(new Comparator<Tuple>() {
@Override
public int compare(Tuple o1, Tuple o2) {
return o1.value.compareTo(o2.value);
}
});
long startOrderedIteration = System.nanoTime();
for (ShardedTotalRandomOrder thread : threads) {
for (int i = 0; i < 10; i ) {
// System.out.println(thread.data2.get(i));
pq.add(thread.data2.get(i));
}
}
List<Integer> overall = new ArrayList<>();
while (!pq.isEmpty()) {
Tuple poll = pq.poll();
ArrayList<Tuple> data2 = threads.get(poll.thread).data2;
if (positions[poll.thread] < data2.size()) {
Tuple nextValue = data2.get(positions[poll.thread] );
pq.offer(nextValue);
}
overall.add(poll.value);
// System.out.println(String.format("%d %d", poll.thread, poll.value));
}
System.out.println(overall);
long endOrderedIteration = System.nanoTime();
long orderedIterationTime = endOrderedIteration - startOrderedIteration;
CodePudding user response:
You don't need to resort them. Since each list is already sorted you can merge them as follows. This uses a single method to merge two lists based on their relative values. Then it returns that list and feeds it back into the method to merge it with the next list.
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
public class Merging {
public static void main(String[] args) {
List<Integer> list1 = List.of(5,10,15,20,25,30,35,40,45,50);
List<Integer> list2 = List.of(2,4,6,8,10);
List<Integer> list3 = List.of(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
int nth = 10;
List<List<Integer>> lists = List.of(list1,list2,list3);
List<Integer> merged = lists.get(0);
for (int i = 1; i < lists.size(); i ) {
merged = mergeLists(merged, lists.get(i));
}
System.out.println(merged.get(nth));
}
prints
7
- This works with any type that implements the Comparable interface.
- It will loop until one list is exhausted or until both indices exceed the combined list size.
- Once either list is finished, the other can be appended via the sublist.
public static <T extends Comparable<? super T>> List<T> mergeLists(List<T> list1, List<T> list2) {
List<T> merged = new ArrayList<>();
int i1 = 0;
int i2 = 0;
while (i1 i2 < list1.size() list2.size()) {
if (i1 >= list1.size()) {
merged.addAll(list2.subList(i2,list2.size()));
break;
}
if (i2 >= list2.size()) {
merged.addAll(list1.subList(i1,list1.size()));
break;
}
if(list1.get(i1).compareTo(list2.get(i2)) <= 0) {
merged.add(list1.get(i1 ));
} else {
merged.add(list2.get(i2 ));
}
}
return merged;
}
}
CodePudding user response:
Here is a relatively efficient (linear with respect to the number of lists) algorithm that leverages some of the power of streams, but avoids a full list merge.
List<List<String>> listList = //...however you create this.
int n = 5;
String curValue=null;
for(int i=0;i<n;i ) {
List<String> nextList = listList.stream()
.sorted(Comparator.comparing(l -> l.get(0)))
.findFirst().get();
curValue=nextList.remove(0);
}
//curValue is now the nth item in a hypothetical merged list.
Notice that this is destructive to the underlying lists. If that is not an option, you could keep an index instead, but that would greatly complicate the comparison part.
I am also not checking for list exhaustion. You could do this by adding a .filter(list -> list.size() > 0)