Given a String, I'd like to use a regex to:
- if the given String does NOT match regex, return the ENTIRE String
- if the given String does match regex, then return ONLY the capture group
Let's say I have the following regex:
hello\s*([a-z] )
Here are inputs and the return I am looking for:
"well hello" --> "well hello" (regex did not match)
"well hello world extra words" --> "world"
"well hello world!!!" --> "world"
"well hello \n \n world\n\n\n" --> "world" (should ignore all newlines)
"this string doesn't match at all" --> "this string doesn't match at all"
Limitations: I am only limited to using grep, sed, and awk. egrep, gawk are not available.
> print "world hello something else\n" | sed -rn "s/hello ([a-z] )/\1/p"
world something else
This is the closest I've gotten. A few things:
- it is returning other parts of the string
- I couldn't get
\s*
to match, but a regular space works - not exactly sure, but the
/p
at the end ofsed
seems to print a newline
CodePudding user response:
Use an alternation:
hello\s*([a-z] )|(.*)
Then extract groups 1 and 2:
sed -rn "s/hello ([a-z] )|(.*)/\1\2/p"
The alternation matches left to right, so if the first parts doesn't match, the whole input is matched; one of group 1 or group 2 will be blank.
CodePudding user response:
This might work for you (GNU sed):
sed -E 's/\\n/\n/g;/^well hello\s*([a-z] ).*/s//\1/;s/\n/\\n/g' file
Turn \n
into real newlines.
Match on lines that begin well hello
, followed by zero or more white space, followed by one or more characters a
thru z
, followed by whatever. If the match is true, return the characters a
thru z
otherwise return the original string.