I need to construct a chain of pair of numbers where:
- In each pair, the first one is smaller than the second
- In order to form a chain between two consecutive nodes, they must have one number in common. In other words, the link
(a,b) -- (c,d)
can be made if and only if eithera==c
,b==c
,a==d
orb==d
- A pair cannot be made of the same number. In other words, if
(a,b)
exists, thena!=b
This may look like a Longest increasing subsequence but I actually want to chain consecutive pairs that have one equal member.
Example:
Initial list (unordered):
(0,1)
(2,3)
(1,6)
(4,6)
(8,9)
(2,8)
Result:
----- chain #1
(0,1)
(1,6)
(4,6)
----- chain #2
(2,3)
(2,8)
(8,9)
I could do an algorithm that will iterate over the entire list for each cell (O(n^2)), but I want to make it faster and I have the flexibility of ordering my initial array in any way I want (std::set, std::map, std::unordered_map, etc.). My list is made of tens of thousands of pairs so I need an efficient solution in terms of processing time.
CodePudding user response:
You can solve it in O(N * log(N))
when you manage two lists, one sorted with respect to first the other sorted with respect to second.
The code has some duplication that I didnt bother to clean up yet.
#include <iostream>
#include <list>
#include <algorithm>
#include <tuple>
#include <any>
struct pair_and_iter {
int first;
int second;
std::any other_iter;
};
struct compare_first {
bool operator()(int x,pair_and_iter p){ return x < p.first; }
bool operator()(pair_and_iter p, int x){ return p.first < x; }
};
struct compare_second {
bool operator()(int x,pair_and_iter p){ return x < p.second; }
bool operator()(pair_and_iter p, int x){ return p.second < x; }
};
template <typename Iter,typename Comp>
Iter my_find(Iter first,Iter last,int x, Comp comp) {
auto it = std::lower_bound(first,last,x,comp);
if (it != last && (!comp(x,*it) && !comp(*it,x))){
return it;
} else {
return last;
}
}
int main() {
std::list<pair_and_iter> a {{0,1},{2,3},{1,6},{4,6},{8,9},{2,8}};
std::list<pair_and_iter> b;
for (auto it = a.begin(); it != a.end(); it){
b.push_back({it->first,it->second,it});
it->other_iter = std::prev(b.end());
}
a.sort([](const auto& x,const auto& y){
return std::tie(x.first,x.second) < std::tie(y.first,y.second); });
b.sort([](const auto& x,const auto& y){
return std::tie(x.second,x.first) < std::tie(y.second,y.first); });
std::vector<std::vector<pair_and_iter>> result;
std::vector<pair_and_iter> current_result;
current_result.push_back(a.front());
auto current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
while (a.size() && b.size()) {
// look for an element with same first
auto it = my_find(a.begin(),a.end(),current->first,compare_first{});
if (it == a.end()) {
// look for element where current->second == elem.first
it = my_find(a.begin(),a.end(),current->second,compare_first{});
}
if (it != a.end()){
current_result.push_back(*it);
current = std::prev(current_result.end());
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
a.erase(it);
continue;
}
// look for element with current->first == elem.second
it = my_find(b.begin(),b.end(),current->first,compare_second{});
if (it == b.end()) {
// look for element with same second
it = my_find(b.begin(),b.end(),current->second,compare_second{});
}
if (it != b.end()) {
current_result.push_back(*it);
current = std::prev(current_result.end());
a.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
b.erase(it);
continue;
}
// no matching element found
result.push_back(current_result);
current_result.clear();
current_result.push_back(a.front());
current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
}
result.push_back(current_result);
for (const auto& chain : result){
for (const auto& elem : chain){
std::cout << elem.first << " " << elem.second << "\n";
}
std::cout << "\n";
}
}
0 1
1 6
4 6
2 3
2 8
8 9
I used std::list
because it has stable iterators and constant time erase. std::any
for type erasure because each list contains iterators to the other list. a
is sorted with respect to first
and b
is sorted with respect to second
. Hence std::lower_bound
can be used to to find a match in O(logN)
. A single linear search is traded against 2 binary searchs to find either current->first
or current->second
in a first
of a
and 2 binary searchs to find either current->first
or current->second
in a second
of b
. In total it is O(N log(N))
for sorting plus O( log(N) log(N-1) log(N-2) .... log(1))
which equals O(log( n! ))
if I am not mistaken.
PS: You didn't mention that you are looking for a longest chain, and this algorithm is not finding the longest chain. It just picks the first element of the remaining ones and uses the next element it finds to continue the chain.