Input
Word-Word, Some other words and this is another word et another one
Expected output
Word-Word
Some
other
words
this
is
another
word
another
one
I have a table (t) with many strings like the one showed in the input.
I'm trying to get every word in the sentence but the comas (','), the word 'and', 'et', 'und' and of course every whitespace or sequence of whitespace that may be between words.
Regex that I'm using:
- Don't match whitespace
\\s
- Don't match whitespace as long as special characters
((\b[^\s] \b)((?<=\.\w).)?)
- doesn't work in postgres for some reason - Don't match a particular word
^(?!et$|and$|und$)
- doesn't work either
Query that I'm running
SELECT word FROM t,
unnest(regexp_split_to_array(t.word, E'Missing expression')) as word;
CodePudding user response:
You can use an extracting approach here in the following way:
SELECT regexp_matches(
'Word-Word, Some other words and this is another word et another one ',
E'\\y(?!(?:et|[ua]nd)\\y)\\w (?:-\\w )*',
'g');
See the online demo. Regex details:
\y
- a word boundary(?!(?:et|[ua]nd)\y)
- a negative lookahead that fails the match if there iset
,und
orand
as whole words immediately to the right of the current location\w (?:-\w )*
- one or more word chars and then zero or more occurrences of-
and one or more word char sequences
See the regex demo (converted to PCRE).