Home > Software design >  How to write redshift aws query to search for a value in comma delimited values
How to write redshift aws query to search for a value in comma delimited values

Time:01-03

table1

user_id country_code
1 'IN,AU,AC'
2 'MX,IN'

table2

user_id valid_country
1 'IN'
1 'AU'
2 'MX'
3 'YT'
4 'RU'

As you can see, some entries in the country_code column are multiple codes separated by commas. I would like to print user_id in table1 and their corresponding country_code only if they are valid. To check for validity here I need to use table2 which has user_id and valid_country.

The desired output is:

user_id country_code
1 'IN'
1 'AU'
2 'MX'

Query looks like

select tb1.user_id, country_code from table1 tb1, table2 tb2 where tb1.user_id=tb2.user_id and <Here I need to check if tb2.country_code is there in tb1.country_code (codes separated by commas)>

Are there any simple solution that I could check valid_country in the comma separated values.

CodePudding user response:

The simple way isn't always the best. There are a number of corner cases that can arise here (like are all country codes 2 letters). That said a LIKE clause would be simple:

select tb1.user_id, valid_country as country_code
from table1 tb1, table2 tb2 
where tb1.user_id=tb2.user_id 
  and tb1.country_code like '%'||tb2.valid_country||'%'

Or if we are to put this in modern SQL syntax:

select tb1.user_id, valid_country as country_code
from table1 tb1 join table2 tb2 
on tb1.user_id=tb2.user_id 
  and tb1.country_code like '%'||tb2.valid_country||'%'

CodePudding user response:

Try this:

a) Verticalise tb1 by CROSS JOINing it with a series of consecutive integers (which I supply in a Common Table Expression), and applying the SPLIT_PART() function to break the comma delimited list into single element.

b) INNER JOIN the verticalised result with the valid user_id/country code combinations table on an equi-join on both columns.

WITH   
-- your table 1, don't use in end query ...                                                                                                                                                                                      
tb1(user_id,country_code) AS (
          SELECT 1,'IN,AU,AC'
UNION ALL SELECT 2,'MX,IN'
)
,
-- your table 2, don't use in end query ...                                                                                                                                                                                      
tb2(user_id,valid_country) AS (
          SELECT 1,'IN'
UNION ALL SELECT 1,'AU'
UNION ALL SELECT 2,'MX'
UNION ALL SELECT 3,'YT'
UNION ALL SELECT 4,'RU'
)
-- real query starts here, replace following comma with "WITH" ...
,
i(i) AS ( -- need a series of integers ...
          SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
)
,
vertical AS (
  SELECT
    tb1.user_id
  , i
  , SPLIT_PART(country_code,',',i) AS valid_country
  FROM tb1 CROSS JOIN i
  WHERE SPLIT_PART(country_code,',',i) <> ''
)
SELECT
  vertical.user_id
, vertical.valid_country
FROM vertical
JOIN tb2 USING(user_id,valid_country)
ORDER BY vertical.user_id,vertical.i
;
-- out  user_id | valid_country 
-- out --------- ---------------
-- out        1 | IN
-- out        1 | AU
-- out        2 | MX
  • Related