Home > Software design >  Regex sed does not give me expected result
Regex sed does not give me expected result

Time:12-08

Sed doesn't give me the expected result. I want to get the output from Group 2 but sed gives me nothing. I ran this command on Ubuntu 20.04.3 LTS and I was using sed (GNU sed) 4.7. But when I tried it on regex101.com, it gave me the expected result. You can see it here.

root@6ab6c9bc0d76:~# cat /etc/issue
Ubuntu 20.04.3 LTS \n \l
root@6ab6c9bc0d76:~# sed --version
sed (GNU sed) 4.7
Packaged by Debian
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3 : GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
Paolo Bonzini, Jim Meyering, and Assaf Gordon.
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.
E-mail bug reports to: <[email protected]>.

Group 2 is empty.

root@6ab6c9bc0d76:~# echo "https://one-two-three-four-five.dev.domain.com" | sed -E "s/(https?:\/\/)([\w|-]*)(.*)/Group1: \1\nGroup2: \2\nGroup3: \3/"
Group1: https://
Group2:
Group3: one-two-three-four-five.dev.domain.com
root@6ab6c9bc0d76:~#

CodePudding user response:

With your GNU sed, you can use

#!/bin/bash
echo "https://one-two-three-four-five.dev.domain.com" | \
 sed -E "s~(https?://)([[:alnum:]_-]*)(.*)~Group1: \1\nGroup2: \2\nGroup3: \3~"

Output:

Group1: https://
Group2: one-two-three-four-five
Group3: .dev.domain.com

See the online demo.

Inside a bracket expression, \w is parsed as a backslash or w matching pattern. [:alnum:] POSIX character class matches digits or letters, so, as \w also matches underscores, you need to combine the [:alnum:] and _ inside the bracket expression than also matches a - char: [[:alnum:]_-]. Note the - must be located at the start/end of the bracket expression.

I used ~ as the regex delimiter char as you have / chars in the regex pattern, this helps avoid over-escaping.

  • Related