Home > Enterprise >  Parsing a string in c with a specfic format
Parsing a string in c with a specfic format

Time:12-29

I have this string post "ola tudo bem como esta" alghero.jpg and i want to break it into 3 pieces post, ola tudo bem como esta (i dont want the "") and alghero.jpg i tried it in c because im new and not really good at programming in c but its not working. Is there a more efficient way of doing this in c ?

Program:

int main()
{
    char* token1 = new char[128];
    char* token2 = new char[128];
    char* token3 = new char[128];
    char str[] = "post \"ola tudo bem como esta\" alghero.jpg";
    char *token;
   
    /* get the first token */
    token = strtok(str, " ");
    //walk through other tokens
    while( token != NULL ) {
        printf( " %s\n", token );
        
        token = strtok(NULL, " ");
    }
    return(0);
}

CodePudding user response:

In C 14 and later, you can use std::quoted to read quoted strings from any std::istream, such as std::istringstream, eg:

#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>

int main()
{
    std::string token1, token2, token3;
    std::string str = "post \"ola tudo bem como esta\" alghero.jpg";
   
    std::istringstream(str) >> token1 >> std::quoted(token2) >> token3;

    std::cout << token1 << "\n";
    std::cout << token2 << "\n";
    std::cout << token3 << "\n";

    return 0;
}

CodePudding user response:

Use find to find the positions of the 2 quotes. Use substr to get the string from index 0 to first quote, first quote to second quote, and second quote to end.

std::string s = "post \"ola tudo bem como esta\" alghero.jpg";
auto first = s.find('\"');
if (first != s.npos) {
    auto second = s.find('\"', first   1);
    if (second != s.npos) {
        std::cout << s.substr(0, first-1) << '\n';
        std::cout << s.substr(first 1, second-first-1) << '\n';
        std::cout << s.substr(second 2) << '\n';
    }
}

Output:

post
ola tudo bem como esta
alghero.jpg

CodePudding user response:

One option for parsing strings is using regular expressions, for example :

#include <iostream>
#include <regex>
#include <string>

// struct to hold return value of parse function
struct parse_result_t
{
    bool parsed{ false };
    std::string token1;
    std::string token2;
    std::string token3;
};

// the parse function
auto parse(const std::string& string)
{
    // this is a regex 
    // ^ match start of line
    // (.*)\\\" matches any character until a \" (escaped ") and then escaped again for C   string
    // \w  match one or more whitepsaces
    // (.*)$ match 0 or more characters until end of string
    // see it live here : https://regex101.com/r/XnkAZV/1
    static std::regex rx{ "^(.*?)\\s \\\"(.*?)\\\"\\s (.*)$" };

    std::smatch match;
    parse_result_t result;

    if (std::regex_search(string, match, rx))
    {
        result.parsed = true;
        result.token1 = match[1];
        result.token2 = match[2];
        result.token3 = match[3];
    }
    
    return result;
}

int main()
{
    auto result = parse("post \"ola tudo bem como esta\" alghero.jpg");

    std::cout << "parse result = " << (result.parsed ? "success" : "failed") << "\n";
    std::cout << "token 1 = " << result.token1 << "\n";
    std::cout << "token 2 = " << result.token2 << "\n";
    std::cout << "token 3 = " << result.token3 << "\n";

    return 0;
}

CodePudding user response:

if the strings are always separated by a single space you can just find the first space and last space using std::string::find and std::string::rfind`, split on those characters, and unquote the middle string:

#include <iostream>
#include <tuple>
#include <string>

std::string unquote(const std::string& str) {
    if (str.front() != '"' || str.back() != '"') {
        return str;
    }
    return str.substr(1, str.size() - 2);
}

std::tuple < std::string, std::string, std::string> parse_triple_with_quoted_middle(const std::string& str) {
    auto iter1 = str.begin()   str.find(' ');
    auto iter2 = str.begin()   str.rfind(' ');

    auto str1 = std::string(str.begin(),iter1);
    auto str2 = std::string(iter1   1, iter2);
    auto str3 = std::string(iter2   1, str.end() );

    return { str1, unquote(str2), str3 };
}

int main()
{
    std::string test = "post \"ola tudo bem como esta\" alghero.jpg";
    auto [str1, str2, str3] = parse_triple_with_quoted_middle(test);
    std::cout << str1 << "\n";
    std::cout << str2 << "\n";
    std::cout << str3 << "\n";
}

You should probably put more input validation into the above, however.

CodePudding user response:

You could use regular expressions for this:

  • The pattern to search repeatedly for would be: optionally starting with whitespaces \s*; then ([^\"]*) zero or more characters other than quotes (zero or more because you could have several quotes one after the other); and we capture this group (hence the use of parentheses); and finally, whether a quote \" or | the end of the expression $; and we don't capture this group (:?).
    We use std::regex to store the pattern, wrapping it all within R"()", so that we can write the raw expression.
  • The while loop does a few things: it searches the next match with regex_search, extracts the captured group, and updates the input line, so that the next search will start where the current one finished.
    matches is an array whose first element, matches[0], is the part of line matching the whole pattern, and the next elements correspond to the pattern's captured groups.

[Demo]

#include <iostream>  // cout
#include <regex>  // regex_search, smatch

int main() {
    std::string line{"post \"ola tudo bem como esta\" alghero.jpg"};
    std::regex pattern{R"(\s*([^\"]*)(:?\"|$))"};
    std::smatch matches{};
    while (std::regex_search(line, matches, pattern))
    {
         std::cout << matches[1] << "\n";
         line = matches.suffix();
    }
}
  • Related