Home > Blockchain >  how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA
how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA

Time:12-18

I need to search a column(varchar) for specific whole words. I'm using the query below but not getting the desired results;

    select *
    from table1
    WHERE upper(c.name) RLIKE ('FECHADO|CIERRE|CLOSED|REVISTO. NORMAL.')

My problem is to guarantee that, for example with the word 'CLOSED', that only matches; 'Case Closed', but not 'Case Disclosed'. The query above can't match whole words only. Can anyone help me to find the best way to achieve those results, both in HIVE an IMPALA.

My best regards

CodePudding user response:

I explained a similar solution here: HIVE/PIG JOIN Based on SUBSTRING match This will make a lot of records and take a long time to run but I think it will run better than running a like as it uses a join to complete the matching instead of searching all records.

If this is a hive only solution I would explode on " " with lateral view and then match on closed.

  1. split - makes an array based on splitting on " "

  2. LATERAL VIEW EXPLODE - changes an array into rows.

    WITH (
        select *
        from table1
        LATERAL VIEW EXPLODE(SPLIT( name, " ")) table_name as names
    ) as exploded_names
    
    select * from exploded_names
        where 
           names IN ('FECHADO','CIERRE','CLOSED','REVISTO.')
    

CodePudding user response:

You can add word boundary \\b to match only exact words:

rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'

(?i) means case insensitive, no need to use UPPER.

And the last alternative in your regex pattern is REVISTO. NORMAL.

If dots in it should be literally dots, use \\.

Like this: REVISTO\\. NORMAL\\.

Dot in regexp means any character and should be shielded with two backslashes to match dot literally.

Above regex works in Hive. Unfortunately I have no Impala to test it

  • Related