For example, I have the word: sh0rt-t3rm
.
How can I get the t3rm
part using perl regex?
I could get sh0rt
by using [(a-zA-Z0-9) ]\[-\]
, but \[-\][(a-zA-Z0-9) ]
doesn't work to get t3rm
.
CodePudding user response:
The syntax used for the regex is not correct to get either sh0rt
or t3rm
You flipped the square brackets and the parenthesis, and the hyphen does not have to be between square brackets.
To get sh0rt
in sh0rt-t3rm
you you might use for example one of:
Regex | Demo | Explanation |
---|---|---|
\b([a-zA-Z0-9] )- |
Demo 1 | \b is a word boundary to prevent a partial word match, the value is in capture group 1. |
\b[a-zA-Z0-9] (?=-) |
Demo 2 | Match the allowed chars in the character class, and assert a - to the right using a positive lookahead (?=-) |
To get t3rm
in sh0rt-t3rm
you might use for example one of:
Regex | Demo | Explanation |
---|---|---|
-([a-zA-Z0-9] )\b |
Demo 3 | The other way around with a leading - and get the value from capture group 1. |
-\K[a-zA-Z0-9] \b |
Demo 4 | Match - and use \K to keep out what is matched so far. Then match 1 or more times the allowed chars in the character class. |
CodePudding user response:
If your whole target string is literally just sh0rt-t3rm
then you want all that comes after the -
.
So the barest and minimal version, cut precisely for this description, is
my ($capture) = $string =~ /-(. )/;
We need parenthesis on the left-hand-side so to make regex run in a list context because that's when it returns the matches (otherwise it returns true/false, normally 1
or ''
).
But what if the preceding text may have -
itself? Then make sure to match all up to that last -
my ($capture) = $string =~ /.*-(. )/;
Here the "greedy" nature of the *
quantifier makes the previous .
match all it possibly can so that the whole pattern still matches; thus it goes up until the very last -
.
There are of course many other variations on how the data may look like, other than just being one hyphenated-word. In particular, if it's a part of a text, you may want to include word-boundaries
my ($capture) = $string =~ /\b.*?-(. ?)\b/;
Here we also need to adjust our "wild-card"-like pattern .
by limiting it using ?
so that it is not greedy. This matches the first such hyphenated word in the $string
. But if indeed only "word" characters fly then we can just use \w
(instead of .
and word-boundary anchors)
my ($capture) = $string =~ /\w*?-(\w )/;
Note that \w
matches [a-zA-Z0-9_]
only, which excludes some characters that may appear in normal text (English, not to mention all other writing systems).
But this is clearly getting pickier and cookier and would need careful close inspection and testing, and more complete knowledge of what the data may look like.
Perl offers its own tutorial, perlretut, and the main full reference is perlre
CodePudding user response:
-([a-zA-Z0-9] )
will match a -
followed by a word, with just the word being captured.