Splitting string with colons and spaces?-CodePudding

So I've made my code work for separating the string:

const char  one[2] = {'s', 'o'};
const char *two[2] = {"...", "---"};
  
char ip(String m) {
    for (int i = 0 ; i < 2 ; i  ) {
        if (strcmp(m.c_str(), two[i]) == 0) {
            return one[i];
        }
    }
}

String ipRe(char d[]) {
    int count = 0;
    char *k = d;
 
    for (count = 1; k[count]; k[count] == ':' ? count   : *k  ) {}
 
    // Serial.println(count);
  
    char *ex[count] = {NULL};
    ex[0] = strtok(d, ":");
 
    int i = 0;
 
    while (i < count) { // 3
        i  ;
        ex[i] = strtok(NULL, ":");
    }
 
    String c;
 
    for (int j = 0 ; j < count; j  ) {
        c  = ip(ex[j]);
    }
  
    return c;                      
}

void setup() {
    Serial.begin(9600);
    Serial.println(ipRe("...:---:..."));
}

Returns "sos" as it should, but how can I split the string if it has a space (or multiple spaces) like:

Serial.println(ipRe("..:---:... ..:---:..."));

So it returns "sos sos"? (Currently returns "so os")

I have had no luck with this, so any help would be greatly appreciated!

CodePudding user response：

I'd suggest using <regex> library if the compiler of yours supports C 11.

#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

const std::regex ws_re(":|  ");
void printTokens(const std::string& input)
{
    std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
               std::sregex_token_iterator(),
               std::ostream_iterator<std::string>(std::cout, "\n"));
}

int main()
{
    const std::string text1 = "...:---:...";
    std::cout<<"no whitespace:\n";
    printTokens(text1);

    std::cout<<"single whitespace:\n";
    const std::string text2 = "..:---:... ..:---:...";
    printTokens(text2);

    std::cout<<"multiple whitespaces:\n";
    const std::string text3 = "..:---:...   ..:---:...";
    printTokens(text3);
}

The description of library is on cppreference. If you are not familiar with regular expressions, the part in the code above const std::regex ws_re(":| "); means that there should be either ':' symbol or (or in regular expressions denoted by pipe symbol '|') any amount of whitespaces (' ' stands for 'one or more symbol that stands before the plus sign'). Then one is able to use this regular expression to tokenize any input with std::sregex_token_iterator. For more complex cases than whitespaces, there is wonderful regex101.com.
The only disadvantage I could think of is that regex engine is likely to be slower than simple handwritten tokenizer.

CodePudding user response：

I would simply add a delimiter to your tokenizer. From a strtok() description the second parameter "is the C string containing the delimiters. These may vary from one call to another".

So add a 'space' delimiter to your tokenization: ex[i] = strtok(NULL, ": "); trim any whitespace from your tokens, and throw away any empty tokens. The last two shouldn't be necessary, because the delimiters won't be part of your collected tokens.