Home > database >  Regex expression not working in SPARK SQL
Regex expression not working in SPARK SQL

Time:09-20

I want to write a regex for spark SQL to return the rows which contain 3 digitals or more than 3 digitals against some column.

for example:

with temp as (

    select '12' col
    union
    select '12a' col
    union
    select '1234' col  --need to return
    union
    select 'ab234' col --need to return
    union
    select '33345abc' col --need to return
)
select col from temp
where col regexp '.*\\d{3,}'

when I run this script in spark SQL, I got no results.

so, is there any logic error for my expexp expression?

but I test it in Hive SQL, it works fine.

CodePudding user response:

You may not need to double escape \d:

SELECT col
FROM temp
WHERE col REGEXP '\d{3,}';

Or, you might have to use [0-9] instead of \d:

SELECT col
FROM temp
WHERE col REGEXP '[0-9]{3,}';

Note that prefacing with .* is probably not needed as REGEXP can handle partial matches of the input.

  • Related