I want to write a regex for spark SQL to return the rows which contain 3 digitals or more than 3 digitals against some column.
for example:
with temp as (
select '12' col
union
select '12a' col
union
select '1234' col --need to return
union
select 'ab234' col --need to return
union
select '33345abc' col --need to return
)
select col from temp
where col regexp '.*\\d{3,}'
when I run this script in spark SQL, I got no results.
so, is there any logic error for my expexp expression?
but I test it in Hive SQL, it works fine.
CodePudding user response:
You may not need to double escape \d
:
SELECT col
FROM temp
WHERE col REGEXP '\d{3,}';
Or, you might have to use [0-9]
instead of \d
:
SELECT col
FROM temp
WHERE col REGEXP '[0-9]{3,}';
Note that prefacing with .*
is probably not needed as REGEXP
can handle partial matches of the input.