Home > Software engineering >  Regex pattern issue remove specific digits
Regex pattern issue remove specific digits

Time:04-01

I'm trying to use a regex to extract a time string in this format only "01 Apr 2022". But I'm having trouble getting these digits out "07:28:00".

std::string test = "Fri, 01 Apr 2022 07:28:00 GMT";

std::string get_date(std::string str) {
    static std::vector<std::regex> patterns = {
        std::regex{"Fri,(. )([0-9] )GMT"},
    };

    for (auto& regex : patterns) {
        std::smatch m;
        if (std::regex_search(str, m, regex)) {
            return m[1]; 
        }
    }
    return str;
}

CodePudding user response:

I would (strongly) advise against using a regex for this purpose.

The C standard library already has an std::get_time to handle tasks like this, and I'd advise simply using it. In this case, the format you've shown seems to fit with a get_time format string like: "%a, %d %b %Y %T".

Demo code:

#include <iostream>
#include <sstream>
#include <iomanip>
#include <chrono>

std::string test = "Fri, 01 Apr 2022 07:28:00 GMT";

int main() {
    std::istringstream buffer { test };

    std::tm t;

    buffer >> std::get_time(&t, "%a, %d %b %Y %T");

    std::cout << "Hour: " << t.tm_hour 
              << ", Minute: " << t.tm_min
              << ", Second: " << t.tm_sec << "\n";
}

CodePudding user response:

Here is a regex which will do the job: std::regex reg{R"(\d{2} \w \d{4})"};. And in your code you use m[0], not m[1].


But if your format is stable (and it sure looks like one) you don't need regex at all. Just do something like this: str.substr(5, 12) or std::string(str.begin() 5, str.begin() 16).

CodePudding user response:

You can use

std::regex{R"(^[a-zA-Z]{3},\s*(.*?)\s*\d{2}(?::\d{2}){2})"}

See the regex demo. Details:

  • ^ - start of string
  • [a-zA-Z]{3} - three letters
  • , - a comma -\s* - zero or more whitespaces
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • \s*
  • \d{2}(?::\d{2}){2} - two digits, :, two digits, : and two digits.

See the C demo:

#include <regex>
#include <string>
#include <iostream>


std::string get_date(std::string str) {
        static std::vector<std::regex> patterns = {
            std::regex{R"(^[a-zA-Z]{3},\s*(.*?)\s*\d{2}(?::\d{2}){2})"},
        };
        
        for (auto& regex : patterns) {
            std::smatch m;
            if (std::regex_search(str, m, regex)) {
                return m[1]; 
            }
        }
        return str;
}
        
int main() {
    std::cout << get_date("Fri, 01 Apr 2022 07:28:00 GMT") << std::endl;    
    return 0;
}

Output:

01 Apr 2022

CodePudding user response:

If you give the pattern like this -

"(Mon|Tue|Wed|Thu|Fri|Sat|Sun),\s (\d{1,})\s (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)\s (\d{4})\s (\d{2}:\d{2}:\d{2})\s GMT"

Then the 5th group m[4] should give you the time (hh:mm:ss) part

  • Related