I have a table that looks like:
id | site_names | site_addresses | industries | feis |
---|---|---|---|---|
30 | Borden Incorporated | 198 Saluda St , Chester , SC , 29706-1579 , United States|198 Saluda St, Chester, SC 29706, USA|198 Saluda St Chester SC 29706-1579 United States | Food and Cosmetics | 12345|45678 |
31 | Butterkrust Bakeries, Inc.|Flowers Baking Co. of Lakeland, LLC|Southern Bakeries, Inc. dba Butterkrust Bakeries | null | Food|Food and Cosmetics | 12345 |
33 | Church & Dwight Canada Corp. | 5485 RUE FERRIER , , MONTREAL, QUEBEC Quebec , , -- , CA | null | null |
I want to split the table into a materialized view where each row is one of the combinations possible when you split site_names, site_addresses, industries, and feis up. So for example, a few rows from this data would be:
id | site_name | site_address | industry | fei |
---|---|---|---|---|
30 | Borden Incorporated | 198 Saluda St , Chester , SC | Food and Cosmetics | 12345 |
30 | Borden Incorporated | 198 Saluda St , Chester , SC | Food and Cosmetics | 45678 |
30 | Borden Incorporated | 198 Saluda St, Chester, SC 29706, USA | Food and Cosmetics | 12345 |
30 | Borden Incorporated | 198 Saluda St, Chester, SC 29706, USA | Food and Cosmetics | 45678 |
... | ||||
31 | Butterkrust Bakeries, Inc. | null | Food | 12345 |
31 | Flowers Baking Co. of Lakeland, LLC | null | Food | 12345 |
I have tried several ways to accomplish this. The closest I got was with this code:
(
with Expanded2 as (
select raw_site_data.id as id_fei,
feis.feis
from raw_site_data,
unnest(string_to_array(raw_site_data.feis, '|')) feis
),
Expanded3 as (
select raw_site_data.id as id_name,
site_names.site_names
from raw_site_data,
unnest(string_to_array(raw_site_data.site_names, '|')) site_names
)
,
Expanded4 as (
select raw_site_data.id as id_address,
site_addresses.site_addresses
from raw_site_data,
unnest(string_to_array(raw_site_data.site_addresses, '|')) site_addresses)
,
Expanded5 as (
select raw_site_data.id as id_industry,
industries.industries
from raw_site_data,
unnest(string_to_array(raw_site_data.industries, '|')) industries)
select id_fei as site_id, feis as fei, site_names as site_name, site_addresses as site_address, industries as industry
from Expanded2, Expanded3, Expanded4, Expanded5 where Expanded2.id_fei = Expanded3.id_name and Expanded3.id_name = Expanded4.id_address and Expanded4.id_address = Expanded5.id_industry
);
which is really close but it does not include any of the rows with nulls in it. Does anyone know how I can do this query while including rows with nulls in the result?
Couple more potentially relevant background points:
- id is a unique, non-null integer in the original table
- Site_names, site_addresses, industries, and feis all need to be split and all may have null values that I want included
- I am using Postgres 13
Thanks!
CodePudding user response:
If you want all combinations, you can use a single query:
select rd.id, site_name, site_address, fei
from raw_data rd left join lateral
regexp_split_to_table(rd.site_names, '\|') site_name
on 1=1 left join lateral
regexp_split_to_table(rd.site_addresses, '\|') site_address
on 1=1 left join lateral
regexp_split_to_table(rd.feis, '\|') fei
on 1=1;