Let's say my data is like:
abcd abcd aaa 1234 1234566789 abcd abcd aaa 123456789 1234sfjsalfj
what I want to do is:
- if a number is 3 to 6 digits and there is
aaa
in front of it, then I keep it. - I do not need other numbers if no aaa ahead, or the number of digits is out of my range
(3-6)
So, this example should be transferred to:
abcd abcd aaa 1234 abcd abcd sfjsalfj
How to do this in Athena SQL? Maybe not a single query. Using with or any other query combination is also fine.
CodePudding user response:
You should be able to use this regex to do the replacement in one query:
(?<!aaa |\d)\d \s*|(aaa (\d{1,2}(?!\d)|\d{7,})\s*)
This looks for any digits not preceded by aaa
, or 1, 2, or >6 digits which are preceded by aaa
. Any matches should be replaced by the empty string (using the two parameter version of regexp_replace
i.e.
SELECT regexp_replace('abcd abcd aaa 1234 1234566789 abcd abcd aaa 123456789 1234sfjsalfj', '(?<!aaa |\d)\d \s*|(aaa (\d{1,2}(?!\d)|\d{7,})\s*)')
Regex demo on regex101