IF we have a sample field say 'note' note could be in ( 'xyz #abc-xyz-dfg. #abd' , '#1236&89'
, '## avc #1 "Changed xyz #2 from "string"\n - stop #3 #'
)
What I am doing is
-REGEXP_EXTRACT( regexp_replace( upper(note),'.?(\\#\\w )' ' $1' ), '.(\\#\\w )', 0 )
But what it is doing is:
if we have a tag as #abc-xyz it's only fetching #abc instead of #abc-xyz
or a tag as #abc-xyz. #abcd it's only fetching #abcd instead of #abc-xyz #abcd
Or if we have #1 or #2, it's skipping it, giving nulls
or if we have #1236&89 , it's giving #123, we need complete
Edit: Solution :
REGEXP_EXTRACT(
regexp_replace(
upper(note),
'.*?(#[\w?\d?\\`\\~!@#$%\\^&*-_ {}\\[\\]\\|//?;:\\"] )',
' $1'
),
'.*(#[\w?\d?\\`\\~!@#$%\\^&*-_ {}\\[\\]\\|//?;:\\"] )',
0
)
CodePudding user response:
\w
is equivalent to [a-zA-Z0-9_]
, which doesn't contain -
, %
, or &
. Instead of using \w
, you should define a character set that contains everything that you want to match, such as [a-zA-Z0-9_%&-]
.
Note that -
must be included as either the first or last character in the set in order for it to treated as a literal. When it exists between two characters it defines a range.