Home > Net >  Regex to match a condition UNLESS it is a hashtag
Regex to match a condition UNLESS it is a hashtag

Time:03-31

I apologise if this question has been asked but I have hunted for an answer and have not been able to find one for my use-case.

I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.

Here is a test string that I have been using to try and find a solution:

happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old

I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.

I have tried the following regex statements.

(#\w \d \w*)|(\w \d \w*) this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.

(?<!#)\w*\d \w* this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.

I have tried other strings which I cnanot remember anymore. I feel like I'm close but missing something important. Any help would be massively appreciated.

CodePudding user response:

You might use

(?<!\S)[^\W\d]*\d\w*
  • (?<!\S) Assert a whitespace boundary to the left
  • [^\W\d]* Match optional word chars except a digit
  • \d Match at least a single digit
  • \w* Match optional word chars

See a regex demo.

If you want to allow a partial match, you can use a negative lookbehind to not assert a # followed by a word boundary:

(?<!#)\b[^\W\d]*\d\w*

See another regex demo.

  • Related