I apologise if this question has been asked but I have hunted for an answer and have not been able to find one for my use-case.
I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.
Here is a test string that I have been using to try and find a solution:
happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old
I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.
I have tried the following regex statements.
(#\w \d \w*)|(\w \d \w*)
this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.
(?<!#)\w*\d \w*
this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.
I have tried other strings which I cnanot remember anymore. I feel like I'm close but missing something important. Any help would be massively appreciated.
CodePudding user response:
You might use
(?<!\S)[^\W\d]*\d\w*
(?<!\S)
Assert a whitespace boundary to the left[^\W\d]*
Match optional word chars except a digit\d
Match at least a single digit\w*
Match optional word chars
See a regex demo.
If you want to allow a partial match, you can use a negative lookbehind to not assert a #
followed by a word boundary:
(?<!#)\b[^\W\d]*\d\w*
See another regex demo.