Home > Blockchain >  Is there any better way to carry out partial match of a vector of strings in c ?
Is there any better way to carry out partial match of a vector of strings in c ?

Time:02-15

I have two vectors of strings and want to carry out a partial/full match between each string i.e

std::vector<std::string> A = { "AA", "ABC" }
std::vector<std::string> B = { "AABB", "AABC", "ABC", "BC"}

I want to perform a partial/full match for all strings in A with all strings in B. Simplest approach would be just run loops over both vectors

for(const string& s1 : A)
{
  for(const string& s2 : B)
  {
    if(s2.find(s1) != std::string::npos)
      std::cout << "partial matched" << std::endl;
  }
}

Results for the above example would be:

"AA"  --> "AABB", "AABC"
"ABC" --> "AABC", "ABC"

Is there any better way to do this?

CodePudding user response:

The brute force method is by far the simplest. You can potentially make the code cleaner by using nested calls to std::any_of.

To be more efficient, you need to wade deep into complex string searching algorithms. I believe the state of the art in multiple-pattern search is Aho-Corasick. A variant is Commentz-Walter.

There's a BSD 2-clause (looks like)-licensed implementation of Aho-Corasick on Github, though you'd probably want to add a modified main searching function to immediately return if any match is found, instead of collecting all.

If you really want to wade deep into the reeds, you could start with this paper and follow the references.

CodePudding user response:

std::set_intersection might be what you want since you do not elaborate on the context.

Assumes the request is:

  1. The match means "full string comparison".
  2. Need to know the partial matched result.

The time complexity of std::intersection is at most 2*(a.size() b.size()) while the brute force method is O(n^2). Reference here

Note: you might want to know the construction and insertion time of std::set as well.

Note 2: you may also use sorted std::vector which is cache friendly for std::set_intersection

#include <set>
#include <string>
#include <iostream>
#include <algorithm>

int main()
{
    std::set<std::string> a{"bar","foo"};
    std::set<std::string> b{"oof","foo","par"};
    std::vector<std::string> intersect;
    std::set_intersection(b.begin(),b.end(),
                        a.begin(),a.end(),
                        std::back_inserter(intersect));

    std::cout << intersect.size() << std::endl;
    for(std::string& matched: intersect){
        std::cout << matched << std::endl;    
    }
    return 0;
    
}

Live Demo

  • Related