Home > Mobile >  How can I get the second part of a hyphenated word using regex?
How can I get the second part of a hyphenated word using regex?

Time:11-23

For example, I have the word: sh0rt-t3rm. How can I get the t3rm part using perl regex?

I could get sh0rt by using [(a-zA-Z0-9) ]\[-\], but \[-\][(a-zA-Z0-9) ] doesn't work to get t3rm.

CodePudding user response:

The syntax used for the regex is not correct to get either sh0rt or t3rm

You flipped the square brackets and the parenthesis, and the hyphen does not have to be between square brackets.

To get sh0rt in sh0rt-t3rm you you might use for example one of:

Regex Demo Explanation
\b([a-zA-Z0-9] )-
Demo 1 \b is a word boundary to prevent a partial word match, the value is in capture group 1.
\b[a-zA-Z0-9] (?=-)
Demo 2 Match the allowed chars in the character class, and assert a - to the right using a positive lookahead (?=-)

To get t3rm in sh0rt-t3rm you might use for example one of:

Regex Demo Explanation
-([a-zA-Z0-9] )\b
Demo 3 The other way around with a leading - and get the value from capture group 1.
-\K[a-zA-Z0-9] \b
Demo 4 Match - and use \K to keep out what is matched so far. Then match 1 or more times the allowed chars in the character class.

CodePudding user response:

If your whole target string is literally just sh0rt-t3rm then you want all that comes after the -.

So the barest and minimal version, cut precisely for this description, is

my ($capture) = $string =~ /-(. )/;

We need parenthesis on the left-hand-side so to make regex run in a list context because that's when it returns the matches (otherwise it returns true/false, normally 1 or '').

But what if the preceding text may have - itself? Then make sure to match all up to that last -

my ($capture) = $string =~ /.*-(. )/;

Here the "greedy" nature of the * quantifier makes the previous . match all it possibly can so that the whole pattern still matches; thus it goes up until the very last -.

There are of course many other variations on how the data may look like, other than just being one hyphenated-word. In particular, if it's a part of a text, you may want to include word-boundaries

my ($capture) = $string =~ /\b.*?-(. ?)\b/;

Here we also need to adjust our "wild-card"-like pattern . by limiting it using ? so that it is not greedy. This matches the first such hyphenated word in the $string. But if indeed only "word" characters fly then we can just use \w (instead of . and word-boundary anchors)

my ($capture) = $string =~ /\w*?-(\w )/;

Note that \w matches [a-zA-Z0-9_] only, which excludes some characters that may appear in normal text (English, not to mention all other writing systems).

But this is clearly getting pickier and cookier and would need careful close inspection and testing, and more complete knowledge of what the data may look like.

Perl offers its own tutorial, perlretut, and the main full reference is perlre

CodePudding user response:

-([a-zA-Z0-9] ) will match a - followed by a word, with just the word being captured.

Demo

  • Related