Home > Enterprise >  Questions about std::begin() for std::string array and "grep" function alternatives?
Questions about std::begin() for std::string array and "grep" function alternatives?

Time:12-02

EDIT: This coincides with my interest from the answer here:

Currently, I have been using this but it is obviously problematic if one needs to find str3, str4,....

size_t find(const std::string& line, const std::string& str1, const std::string& str2, int pos) {
    int eol1 = line.find(str1,pos);
    int eol2 = line.find(str2,pos);
    return (eol1 < eol2) ? eol2 : eol1;
}

size_t find(const std::string& line, std::vector<std::string> vect, int pos ) {
    int eol1; 
    eol1 = 0;
    for (std::vector<std::string>::iterator iter = vect.begin(); iter != vect.end();   iter){
        //std::cout << *iter << std::endl;
        int eol2 = line.find(*iter, pos);
        if (eol1 == 0 && eol2 > 0)
            eol1 = eol2;
        else if ( eol2 > 0 && eol2 < eol1)
            eol1 = eol2;
    }
    return eol1;
}

Question: Why cannot std::begin() work for static while NOT for dynamic and what is the most simple or efficient alternative(s)?

Curiously, I have used frequently two or three words searching in Fortran routines, but no one compact "Multi-string search" function is populated in c communities. Would one has to implement the complicated "grep" family or "regex" if you needs this functionality?

 bool contains(const std::string& input, const std::string keywords[]){//cannot work
    //std::string keywords[] = {"white","black","green"}; // can work
    return std::any_of(std::begin(keywords), std::end(keywords),
        [&](const std::string& str) {return input.find(str) != std::string::npos; });
}

Why the vectorized version cannot work either?

bool contains(const std::string& input, const std::vector<std::string> keywords){
// do not forget to make the array static!
//std::string keywords[] = {"white","black","green"};
return std::any_of(std::begin(keywords), std::end(keywords),
    [&](const std::string& str) {return input.find(str) != std::string::npos; });
}

CodePudding user response:

Coming at it from a different angle, maybe an array of strings isn't the best container to check against. I would advise to use std::set.

#include <cassert>
#include <iostream>
#include <set>
#include <string>
#include <string_view>

std::set<std::string_view> keywords{ "common", "continue", "data", "dimension" };
std::set<char> delimiters{ ' ', ',' , '.', '!', '?', '\n' };

inline bool is_keyword(const std::string_view& word)
{
    return keywords.find(word) != keywords.end();
}

inline bool is_delimiter(const char c)
{
    return delimiters.find(c) != delimiters.end();
}

bool contains_keyword(const std::string& sentence)
{
    auto word_begin = sentence.begin();
    auto word_end = sentence.begin();

    do
    {
        // create string views over each word 
        // words are found by looking for delimiters
        // string_view is used so no data is copied into temporaries
        while ((word_end != sentence.end()) && !is_delimiter(*word_end)) word_end  ;
        std::string_view word{ word_begin,word_end };

        // stop as soon as keyword is found
        if (is_keyword(word)) return true;

        // skip delimiters
        while ((word_end != sentence.end()) && is_delimiter(*word_end)) word_end  ;
        word_begin = word_end;

    } while (word_end != sentence.end());

    return false;
}

int main()
{
    std::string sentence_with_keyword{ "this input sentence, has keyword data in it" };
    bool found = contains_keyword(sentence_with_keyword);
    assert(found);

    if (found)
    {
        std::cout << "sentence contains keyword\n";
    }

    std::string sentence_without_keyword{ "this sentence will not contain any keyword!" };
    found = contains_keyword(sentence_without_keyword);
    assert(!found);

    return 0;
}

CodePudding user response:

  1. Why cannot std::begin() work for static while NOT for dynamic and what is the most simple or efficient alternative(s)?

What the code in the other answer is referring to is a static local variable.

// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};

The keyword static here is a shortcut: it turns keywords into a global variable, but scoped locally. So a clearer way to represent what the author was saying might be this:

// put this here...
std::wstring keywords[] = {L"white",L"black",L"green", ...};
    
bool ContainsMyWordsNathan(const std::wstring& input)
{
    //... instead of here
    return std::any_of(std::begin(keywords), std::end(keywords),
      [&](const std::wstring& str){return input.find(str) != std::string::npos;});
}

The code will work just fine if you use an std::vector or an array inside the function. But there is overhead each time to build the list each time you call it.

When it's defined globally, the keyword list is constructed once and left in memory for the duration of the program.


  1. I have used frequently two or three words searching in Fortran routines, but no one compact "Multi-string search" function is populated in c communities. Would one has to implement the complicated "grep" family or "regex" if you needs this functionality?

C isn't really a compact one-liner kind of language. The algorithm header is meant to give you a way to express your algorithm in terms that make it clear what it's doing (std::any_of, std::count, std::copy_if, etc).

Your code is searching for one keyword, doing a pass each time. Instead of doing multiple searches over the text, you might consider tokenizing your string first by finding groups of alphanumeric characters. Then search a set or map to see if the word is a keyword, as the other answer suggests.

It's far from a compact one-liner, but here's how I would implement this:

bool is_alpha(const char c) {
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}

bool is_not_alpha(const char c) {
    return !is_alpha(c);
}

std::unordered_set<std::string_view> keywords = { "red", "blue", "yellow" };

bool has_keyword(std::string_view input) {
    auto it = input.begin();
    while (it != input.end()) {
        // find a word
        auto word_start = std::find_if(it, input.end(), is_alpha);
        auto word_end = std::find_if(word_start, input.end(), is_not_alpha);
        std::string_view token { &*word_start, static_cast<size_t>(word_end - word_start) };
        
        // test if it's a keyword
        if (keywords.find(token) != keywords.end())
            return true;

        it = word_end;
    }

    return false;
}
  •  Tags:  
  • c
  • Related