Can you help me with this aproach :
The thing is, we need to do a case-insensitive search for the keywords
in a string
(for a function which return true is if any of the keyword is found in the string, elsewise false)
So I am using this piece of code:
std::transform(string.begin(), string.end(), string.begin(), ::toupper);
std::transform(keywords.begin(), keywords.end(), keywords.begin(), ::toupper);
std::istringstream iss(keywords);
std::string word;
while(iss >> word) {
if(string.find(word) != std::string::npos)
return true;
}
return false;
The problem with this is that it creates unnecessary copies of the existing data. Can there be a better approach to it?
CodePudding user response:
First of all for making this more reuseable creating an object responsible for holding the keyword data would be preferrable. You can use std::string_view
s, std::pair<std::string::const_iterator, std::string::const_iterator>
or something similar to avoid making a copy of the string data for the keywords and using std::search
to find the keywords allows you to prevent having to copy the string to convert it to upper case for a search while also keeping benefit of converting the keywords to upper case:
class KeywordSearch
{
std::vector<std::string_view> m_keywords;
std::string m_keywordData;
public:
KeywordSearch(std::string&& keywords)
: m_keywordData(std::move(keywords))
{
auto pos = m_keywordData.begin();
auto const end = m_keywordData.end();
std::for_each(pos, end, [](char& c) { c = std::toupper(c); });
pos = std::find_if(pos, end, [](unsigned char c) { return !std::isspace(c); });
while (pos != end)
{
auto keywordEnd = std::find_if(pos 1, end, [](unsigned char c) { return std::isspace(c); });
m_keywords.emplace_back(pos, keywordEnd);
pos = std::find_if(keywordEnd, end, [](unsigned char c) { return !std::isspace(c); });
}
}
// allow only move for now; copy would require an update of m_keywords
KeywordSearch(KeywordSearch&&) noexcept = default;
KeywordSearch& operator=(KeywordSearch&&) noexcept = default;
bool operator()(std::string const& haysack) const
{
for (auto const& keyword : m_keywords)
{
if (std::search(haysack.begin(), haysack.end(), keyword.begin(), keyword.end(),
[](char haysackChar, char keywordChar)
{
return std::toupper(haysackChar) == keywordChar;
}) != haysack.end())
{
return true;
}
}
return false;
}
};
void Test(KeywordSearch const& search, std::string const& str)
{
std::cout << (search(str) ? " keyword found in '" : "keyword not found in '") << str << "'\n";
}
int main() {
KeywordSearch search("foo bar baz");
Test(search, "NoFoOB");
Test(search, "barblabla");
Test(search, "babbba");
Test(search, "hello world");
Test(search, "hello wobaz");
}
CodePudding user response:
Yes, maybe. If you want to avoid copies, then you can use an iterator
.
C offers a functionality to iterate over paterrns in a string. This is the std::sregex_token_iterator
. You can read here about that. You can either define a "positive" pattern, so, what you are looking for. Example: "\w " will look for words. Or, you do a "negative" search, meaning, specify the separator (e.g., ' '
as a std::regex
) and add "-1" as fourth parameter.
Then you can iterate over all keywords.
As for the case insenitivity. Do the conversion one time. I will not even show it in my below example.
First I created a small demo, where I print out the keywords that have been found in the given std::string
.
#include <iostream>
#include <regex>
#include <string>
#include <iterator>
#include <algorithm>
const std::regex re{ R"(\w )" };
int main() {
// Keys
const std::string keys{ "abc def ghi jkl" };
// Search string
std::string s{ "abcxxxghixxx" };
std::copy_if(std::sregex_token_iterator(keys.begin(), keys.end(), re), {},
std::ostream_iterator<std::string>(std::cout,"\n"),
[&s](const std::string& k) {return s.find(k) != std::string::npos; });
}
This approach can be taken over to build your needed function.
One of many possible solutions:
#include <iostream>
#include <regex>
#include <string>
#include <iterator>
#include <algorithm>
const std::regex re{ R"(\w )" };
bool isAnyKeyWordInString(const std::string& keys, const std::string& s) {
bool result{};
std::for_each(std::sregex_token_iterator(keys.begin(), keys.end(), re), {},
[&](const std::string& k) {result |= (s.find(k) != std::string::npos); });
return result;
}
int main() {
// Keys
const std::string keys{ "abc def ghi jkl" };
// Search string
std::string s{ "abcxxxghixxx" };
// Evaluate
if (isAnyKeyWordInString(keys, s))
std::cout << "At least one key-word found\n";
else
std::cout << "No Keyword found\n";
}