Home > Software design >  How to unnest multiple columns while including nulls
How to unnest multiple columns while including nulls

Time:09-25

I have a table that looks like:

id site_names site_addresses industries feis
30 Borden Incorporated 198 Saluda St , Chester , SC , 29706-1579 , United States|198 Saluda St, Chester, SC 29706, USA|198 Saluda St Chester SC 29706-1579 United States Food and Cosmetics 12345|45678
31 Butterkrust Bakeries, Inc.|Flowers Baking Co. of Lakeland, LLC|Southern Bakeries, Inc. dba Butterkrust Bakeries null Food|Food and Cosmetics 12345
33 Church & Dwight Canada Corp. 5485 RUE FERRIER , , MONTREAL, QUEBEC Quebec , , -- , CA null null

I want to split the table into a materialized view where each row is one of the combinations possible when you split site_names, site_addresses, industries, and feis up. So for example, a few rows from this data would be:

id site_name site_address industry fei
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 45678
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 45678
...
31 Butterkrust Bakeries, Inc. null Food 12345
31 Flowers Baking Co. of Lakeland, LLC null Food 12345

I have tried several ways to accomplish this. The closest I got was with this code:

(
with Expanded2 as (
    select raw_site_data.id as id_fei,
           feis.feis
    from raw_site_data,
         unnest(string_to_array(raw_site_data.feis, '|')) feis
),
     Expanded3 as (
         select raw_site_data.id as id_name,
                site_names.site_names
         from raw_site_data,
              unnest(string_to_array(raw_site_data.site_names, '|')) site_names
     )
     ,
     Expanded4 as (
         select raw_site_data.id as id_address,
                site_addresses.site_addresses
         from raw_site_data,
              unnest(string_to_array(raw_site_data.site_addresses, '|')) site_addresses)
     ,
     Expanded5 as (
         select raw_site_data.id as id_industry,
                industries.industries
         from raw_site_data,
              unnest(string_to_array(raw_site_data.industries, '|')) industries)
select id_fei as site_id, feis as fei, site_names as site_name, site_addresses as site_address, industries as industry
    from Expanded2, Expanded3, Expanded4, Expanded5 where Expanded2.id_fei = Expanded3.id_name and Expanded3.id_name = Expanded4.id_address and Expanded4.id_address = Expanded5.id_industry
    );

which is really close but it does not include any of the rows with nulls in it. Does anyone know how I can do this query while including rows with nulls in the result?

Couple more potentially relevant background points:

  1. id is a unique, non-null integer in the original table
  2. Site_names, site_addresses, industries, and feis all need to be split and all may have null values that I want included
  3. I am using Postgres 13

Thanks!

CodePudding user response:

If you want all combinations, you can use a single query:

select rd.id, site_name, site_address, fei
from raw_data rd left join lateral
     regexp_split_to_table(rd.site_names, '\|') site_name
     on 1=1 left join lateral
     regexp_split_to_table(rd.site_addresses, '\|') site_address
     on 1=1 left join lateral
     regexp_split_to_table(rd.feis, '\|') fei
     on 1=1;
  • Related