Home > Enterprise >  Could you recommend, how reimplement split function to work with string_view?
Could you recommend, how reimplement split function to work with string_view?

Time:12-16

I write this split function, can't find easy way to split by string_view(several chars). My function:

size_t split(std::vector<std::string_view>& result, std::string_view in, char sep) {
    result.reserve(std::count(in.begin(), in.end(), in.find(sep) != std::string::npos)   1);
    for (auto pfirst = in.begin();;   pfirst) {
        auto pbefore = pfirst;
        pfirst = std::find(pfirst, in.end(), sep);
        result.emplace_back(q, pfirst-pbefore);
        if (pfirst == in.end())
            return result.size();
    }
}

I want to call this split function with string_view separator. For example:

str = "apple, phone, bread\n keyboard, computer"
split(result, str, "\n,")
Result:['apple', 'phone', 'bread', 'keyboard', 'computer']

My question is, how can i implement this function as fast as possible?

CodePudding user response:

First, you are using std::count() incorrectly.

Second, std::string_view has its own find_first_of() and substr() methods, which you can use in this situation, instead of using iterators. find_first_of() allows you to specify multiple characters to search for.

Try something more like this:

size_t split(std::vector<std::string_view>& result, std::string_view in, std::string_view seps) {
    result.reserve(std::count_if(in.begin(), in.end(), [&](char ch){ return seps.find(ch) != std::string_view::npos; })   1);
    std::string_view::size_type start = 0, end;
    while ((end = in.find_first_of(seps, start)) != std::string_view::npos) {
        result.push_back(in.substr(start, end-start));
        start = in.find_first_not_of(' ', end 1);
    }
    if (start != std::string_view::npos)
        result.push_back(in.substr(start));
    return result.size();
}

Online Demo

CodePudding user response:

This is my take on splitting a string view, just loops once over all the characters in the string view and returns a vector of string_views (so no copying of data) The calling code can still use words.size() to get the size if needed. (I use C 20 std::set contains function)

Live demo here : https://onlinegdb.com/tHfPIeo1iM

#include <iostream>
#include <set>
#include <string_view>
#include <vector>

auto split(const std::string_view& string, const std::set<char>& separators)
{
    std::vector<std::string_view> words;
    auto word_begin{ string.data() };
    std::size_t word_len{ 0ul };

    for (const auto& c : string)
    {
        if (!separators.contains(c))
        {
            word_len  ;
        }
        else
        {
            // we found a word and not a seperator repeat
            if (word_len > 0)
            {
                words.emplace_back(word_begin, word_len);
                word_begin  = word_len;
                word_len = 0;
            }

            word_begin  ;
        }
    }
    
    // string_view doesn't have a trailing zero so
    // also no trailing separator so if there is still
    // a word in the "pipeline" add it too
    if (word_len > 0)
    {
        words.emplace_back(word_begin, word_len);
    }

    return words;
}

int main()
{
    std::set<char> seperators{ ' ', ',', '.', '!', '\n' };
    auto words = split("apple, phone, bread\n keyboard, computer", seperators);
    
    bool comma = false;
    std::cout << "[";
    for (const auto& word : words)
    {
        if (comma) std::cout << ", ";
        std::cout << word;
        comma = true;
    }
    std::cout << "]\n";

    return 0;
}

CodePudding user response:

I do not know about performance, but this code seems a lot simpler

    std::vector<std::string> ParseDelimited(
        const std::string &l, char delim )
    {
        std::vector<std::string> token;
        std::stringstream sst(l);
        std::string a;
        while (getline(sst, a, delim))
            token.push_back(a);
        return token;
    }
  •  Tags:  
  • c
  • Related