Home > front end >  Filtering a Snowflake SQL SELECT statement
Filtering a Snowflake SQL SELECT statement

Time:07-16

I'm trying to find a good maybe subquery strategy to select a value within a document by matching another value, with multiple documents. I can execute a query, but not in a scalable methodology for data that changes. Imagine I have the below dataset:

{
    "company": [{
            "company_data": [{
                "Org": "Company1",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "JohnDoe"
            }, {
                "Org": "Company1",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "JaneDoe"
            }, {
                "Org": "Company1",
                "Position": "StaffEngineer",
                "Department": "Engineering",
                "Name": "JamesDoe"
            }]
        },
        {
            "company_data": [{
                "Org": "Company2",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "JohannaDoe"
            }, {
                "Org": "Company2",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "FidoDoe"
            }, {
                "Org": "Company2",
                "Position": "StaffEngineer",
                "Department": "Engineering",
                "Name": "BaileyDoe"
            }, {
                "Org": "Company2",
                "Position": "Intern",
                "Department": "Fetching",
                "Name": "BaileyDoe"
            }]
        }, {
            "company_data": [{
                "Org": "Company3",
                "Position": "Custodian",
                "Department": "Operations",
                "Name": "AristotleDoe"
            }, {
                "Org": "Company3",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "SocratesDoe"
            }, {
                "Org": "Company3",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "BaconDoe"
            }, {
                "Org": "Company3",
                "Position": "Intern",
                "Department": "Fetching",
                "Name": "NietzcheDoe"
            }]
        }
    ]
}

How can I write a query that pulls the name of the manager from each document? This is to form a table like:

COMPANY     MANAGER     CONTENTS
Company1    JohnDoe     [{'Position':'Manager',...*json*...}]
Company2    JohannaDoe  [{'Position':'Manager',...*json*...}]
Company3    BaconDoe    [{'Position':'Custodian',...*json*...}]

In my real dataset I can usually depend on indexing in the select statement. In the typical case that would be selecting the first value:

SELECT 
     company:company_data:Org as company, 
     company:company_data[0]:Name as manager,
     company:company_data as contents
FROM
     poorly_imagined_table;

But in the 3rd case, the manager is not a part of the first entry [0]. How can I filter the SQL statement so that it looks something like this (which does not work)?

SELECT 
     company:company_data:Org as company, 
     company:company_data[WHERE company:company_data:Position = 'Manager']:Name as manager,
     company:company_data as contents
FROM
     poorly_imagined_table;

Note: a standard WHERE clause below SELECT does not work in this case since I need to specifically match a value when selecting rather than filtering lower with WHERE.

Let me know if anyone has a suggestion! Thanks!

CodePudding user response:

You need to flatten this twice, once to flatten the company array and once to flatten the company_data array inside each company:

create or replace table T1(V variant);

insert into T1 select parse_json($$ <Your JSON>  $$);

If you're just looking to get the information for the managers:

select   CD.VALUE:Org::string  as COMPANY
        ,CD.VALUE:Name::string as MANAGER
        ,CD.VALUE              as CONTENTS
from     T1
        ,table(flatten(v:company)) C
        ,table(flatten(VALUE:company_data)) CD
where    CD.VALUE:Position::string ilike 'Manager'
;

Output:

COMPANY MANAGER CONTENTS
Company1 JohnDoe { company_data object here... }
Company2 JohannaDoe { company_data object here... }
Company3 BaconDoe { company_data object here... }

If you're trying to get the information for all employees and relate them to their managers:

with EMPLOYEE_LIST as
(
select   CD.VALUE:Org::string  as COMPANY
        ,CD.VALUE:Name::string as EMPLOYEE_NAME
        ,CD.VALUE              as CONTENTS
from     T1
        ,table(flatten(v:company)) C
        ,table(flatten(VALUE:company_data)) CD
), MANAGER_LIST as
(
select COMPANY, EMPLOYEE_NAME from EMPLOYEE_LIST
where  CONTENTS:Position::string ilike 'Manager'
)
select   E.COMPANY
        ,E.EMPLOYEE_NAME
        ,M.EMPLOYEE_NAME as MANAGER
        ,CONTENTS
from    EMPLOYEE_LIST E
    left join MANAGER_LIST M on E.COMPANY = M.COMPANY
;

Output:

COMPANY EMPLOYEE_NAME MANAGER CONTENTS
Company1 JohnDoe JohnDoe company_data object here...
Company1 JaneDoe JohnDoe company_data object here...
Company1 JamesDoe JohnDoe company_data object here...
Company2 JohannaDoe JohannaDoe company_data object here...
Company2 FidoDoe JohannaDoe company_data object here...
Company2 BaileyDoe JohannaDoe company_data object here...
Company2 BaileyDoe JohannaDoe company_data object here...
Company3 AristotleDoe BaconDoe company_data object here...
Company3 SocratesDoe BaconDoe company_data object here...
Company3 BaconDoe BaconDoe company_data object here...
Company3 NietzcheDoe BaconDoe company_data object here...

CodePudding user response:

Sometimes it makes sense to get out of SQL and into a JS/Python UDF when parsing documents.

This one would do it:

create or replace function find_manager(company_data variant)
returns string
language python
handler = 'x'
runtime_version = '3.8'
as $$
def x(company_data):
    for dep in company_data:
        if dep['Position'] == 'Manager':
            return dep['Name']

In action:

with data as (
select $$
{
    "company": [{
            "company_data": [{
                "Org": "Company1",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "JohnDoe"
            }, {
                "Org": "Company1",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "JaneDoe"
            }, {
                "Org": "Company1",
                "Position": "StaffEngineer",
                "Department": "Engineering",
                "Name": "JamesDoe"
            }]
        },
        {
            "company_data": [{
                "Org": "Company2",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "JohannaDoe"
            }, {
                "Org": "Company2",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "FidoDoe"
            }, {
                "Org": "Company2",
                "Position": "StaffEngineer",
                "Department": "Engineering",
                "Name": "BaileyDoe"
            }, {
                "Org": "Company2",
                "Position": "Intern",
                "Department": "Fetching",
                "Name": "BaileyDoe"
            }]
        }, {
            "company_data": [{
                "Org": "Company3",
                "Position": "Custodian",
                "Department": "Operations",
                "Name": "AristotleDoe"
            }, {
                "Org": "Company3",
                "Position": "SrEngineer",
                "Department": "Engineering",
                "Name": "SocratesDoe"
            }, {
                "Org": "Company3",
                "Position": "Manager",
                "Department": "Engineering",
                "Name": "BaconDoe"
            }, {
                "Org": "Company3",
                "Position": "Intern",
                "Department": "Fetching",
                "Name": "NietzcheDoe"
            }]
        }
    ]
}$$ doc
)

select x.value:company_data[0]:Org::string company_name
    , find_manager(x.value:company_data)
from data, table(flatten(parse_json(doc):company)) x;

enter image description here

  • Related