Home > Net >  How to name regex group matches in C the way python does (?P<name_of_regex>(.*))
How to name regex group matches in C the way python does (?P<name_of_regex>(.*))

Time:09-27

I have a string in my program that contains certain values for parameters. I need to extract the values from the parameters using regex.

The regex looks like this:

std::smatch param;
std::string str = "--name=AName --age=AnAge --gender=AGender"

if (std::regex_match(str, param, std::regex(".*--name=(\\w ) .*--age=(\\d ) .*--gender=(\\w ) .*"))) 
{
    //if it finds the order of the regex will come here and the values for each will be stored in param[1-3] 
}

The problem is the order of the params can come in different orders, for example:

std::string str = "--gender=AGender --name=AName --age=AnAge"
std::string str = "--age=AnAge --gender=AGender --name=AName"
std::string str = "--name=AName --gender=AGender --age=AnAge "

Is there a way to express in a single regex expression to be able to capture values despite of the order instead of doing on regex per parameter I want to find? If so how can I access such value? In python is possible to add an <id> before the desired group to then later access it using same identifier. In my example code I do that using smatch type variable but the access to it depends on the order that the string has and I cannot rely on that.

CodePudding user response:

Use this regex: "^(?=.*--name=(\\w ))(?=.*--age=(\\d ))(?=.*--gender=(\\w )). "

The one problem you'll run into is the fact that params won't be able to determine which item belongs to which parameter.

The way I would solve this problem would be to use std::string::find.

For example:

std::string str = "--name=AName --age=AnAge --gender=AGender";
size_t namePos = str.find("--name=");
size_t agePos = str.find("--age="); 
size_t genderPos = str.find("--gender=");

std::string name = "";
std::string gender = "";
std::string age = "";

if(namePos != std::string::npos)
{
  // Add 7 to namePos since the size of "--name=" is 7.
  // Assuming that the delimiter of the name is whitespace so find the first
  // whitespace after --name=
  name = str.substr(namePos   7, str.find_first_of(" \n\r", namePos   7) - (namePos   7));
}

if(agePos != std::string::npos)
{  
  // Add 6 to agePos since the size of "--age=" is 6.
  // Assuming that the delimiter of the age is whitepace so find the first 
  // whitespace after --age=
  age = str.substr(agePos   6, str.find_first_of(" \n\r", agePos   6) - (agePos   6));
}

if(genderPos != std::string::npos)
{
  // Add 9 to genderPos since the size of "--gender=" is 9.
  // Assuming that the delimiter of the gender is whitespace so find the first
  // whitespace after --gender=
  gender = str.substr(genderPos   9, str.find_first_of(" \n\r", genderPos   9) - (genderPos   9));

std::cout << name << " " <<  gender << " " << age << std::endl;
}


Output:

AName AGender AnAge

CodePudding user response:

There are better tools to parse commandlines, but if you really want to use regex, you will find that Boost::Regex makes this much easier than the std::regex.

In particular, it supports named groups (see e.g. Boost Regular Expression: Getting the Named Group) which is the feature you request in your question title.

You can combine that with BOOST_REGEX_MATCH_EXTRA to keep all matches for all named groups (by default, only the last match for each capture group is accessible after the search.)

Then you can just make a big disjunction ((?<group1>...)|(?<group2>...)|...) in your regex for all the groups you may encounter, and you will be able to get all values out regardless of their order.

  • Related