My strings are:
- "TESTING_ABC_1-JAN-2022.BCK-gz;1"
- "TESTING_ABC_30-JAN-2022.BCK-gz;1"
In bash when I run:
echo "TESTING_ABC_1-JAN-2022.BCK-gz;1" | sed 's/.*\([0-9]\{1,2\}-[A-Z][A-Z][A-Z]-[0-9][0-9][0-9][0-9]\).*/\1/'
it returns 1-JAN-2022 which is good.
But when I run:
echo "TESTING_ABC_30-JAN-2022.BCK-gz;1" | sed 's/.*\([0-9]\{1,2\}-[A-Z][A-Z][A-Z]-[0-9][0-9][0-9][0-9]\).*/\1/'
I get 0-JAN-2022 but I want 30-JAN-2022.
From me passing in my string. How can I do it so that I can get single or double digit dates in one line like "30-JAN-2022" or "1-JAN-2022"
CodePudding user response:
Using sed
$ echo "TESTING_ABC_1-JAN-2022.BCK-gz;1
> TESTING_ABC_30-JAN-2022.BCK-gz;1" | sed -E 's/[^0-9]*([^.]*).*/\1/'
1-JAN-2022
30-JAN-2022
CodePudding user response:
It is much easier to use awk
and avoid any regex:
cat file
TESTING_ABC_1-JAN-2022.BCK-gz;1
TESTING_ABC_30-JAN-2022.BCK-gz;1
awk -F '[_.]' '{print $3}' file
1-JAN-2022
30-JAN-2022
Another option is to use grep -Eo
with a valid regex for date in DD-MON-YYYY
format:
grep -Eo '[0-9]{1,2}-[A-Z]{3}-[0-9]{4}' file
1-JAN-2022
30-JAN-2022
CodePudding user response:
The problem with your regex is that greedy *
quantifier: .*
will match as many characters as possible while still being able to match the rest of your expression. In many regex implementations you can switch the greedyness of *
by adding ?
. So /.*?a/
would match as few characters as possible until it finds an a
.
Unfortunately, sed doesn't support switching greedyness. Here are two options:
If your string always ends with _
before the date, you can simply add _
to the .*
part:
$ sed -r 's/.*_([0-9]{1,2}-[A-Z]{3}-[0-9]{4}).*/\1/' <<< "TESTING_ABC_30-JAN-2022.BCK-gz;1"
30-JAN-2022
Or just grep the relevant parts:
$ grep -Po '([0-9]{1,2}-[A-Z]{3}-[0-9]{4})' <<< "TESTING_ABC_30-JAN-2022.BCK-gz;1"
30-JAN-2022