I'm trying to find a good maybe subquery strategy to select a value within a document by matching another value, with multiple documents. I can execute a query, but not in a scalable methodology for data that changes. Imagine I have the below dataset:
{
"company": [{
"company_data": [{
"Org": "Company1",
"Position": "Manager",
"Department": "Engineering",
"Name": "JohnDoe"
}, {
"Org": "Company1",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "JaneDoe"
}, {
"Org": "Company1",
"Position": "StaffEngineer",
"Department": "Engineering",
"Name": "JamesDoe"
}]
},
{
"company_data": [{
"Org": "Company2",
"Position": "Manager",
"Department": "Engineering",
"Name": "JohannaDoe"
}, {
"Org": "Company2",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "FidoDoe"
}, {
"Org": "Company2",
"Position": "StaffEngineer",
"Department": "Engineering",
"Name": "BaileyDoe"
}, {
"Org": "Company2",
"Position": "Intern",
"Department": "Fetching",
"Name": "BaileyDoe"
}]
}, {
"company_data": [{
"Org": "Company3",
"Position": "Custodian",
"Department": "Operations",
"Name": "AristotleDoe"
}, {
"Org": "Company3",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "SocratesDoe"
}, {
"Org": "Company3",
"Position": "Manager",
"Department": "Engineering",
"Name": "BaconDoe"
}, {
"Org": "Company3",
"Position": "Intern",
"Department": "Fetching",
"Name": "NietzcheDoe"
}]
}
]
}
How can I write a query that pulls the name of the manager from each document? This is to form a table like:
COMPANY MANAGER CONTENTS
Company1 JohnDoe [{'Position':'Manager',...*json*...}]
Company2 JohannaDoe [{'Position':'Manager',...*json*...}]
Company3 BaconDoe [{'Position':'Custodian',...*json*...}]
In my real dataset I can usually depend on indexing in the select statement. In the typical case that would be selecting the first value:
SELECT
company:company_data:Org as company,
company:company_data[0]:Name as manager,
company:company_data as contents
FROM
poorly_imagined_table;
But in the 3rd case, the manager is not a part of the first entry [0]. How can I filter the SQL statement so that it looks something like this (which does not work)?
SELECT
company:company_data:Org as company,
company:company_data[WHERE company:company_data:Position = 'Manager']:Name as manager,
company:company_data as contents
FROM
poorly_imagined_table;
Note: a standard WHERE clause below SELECT does not work in this case since I need to specifically match a value when selecting rather than filtering lower with WHERE.
Let me know if anyone has a suggestion! Thanks!
CodePudding user response:
You need to flatten this twice, once to flatten the company
array and once to flatten the company_data
array inside each company
:
create or replace table T1(V variant);
insert into T1 select parse_json($$ <Your JSON> $$);
If you're just looking to get the information for the managers:
select CD.VALUE:Org::string as COMPANY
,CD.VALUE:Name::string as MANAGER
,CD.VALUE as CONTENTS
from T1
,table(flatten(v:company)) C
,table(flatten(VALUE:company_data)) CD
where CD.VALUE:Position::string ilike 'Manager'
;
Output:
COMPANY | MANAGER | CONTENTS |
---|---|---|
Company1 | JohnDoe | { company_data object here... } |
Company2 | JohannaDoe | { company_data object here... } |
Company3 | BaconDoe | { company_data object here... } |
If you're trying to get the information for all employees and relate them to their managers:
with EMPLOYEE_LIST as
(
select CD.VALUE:Org::string as COMPANY
,CD.VALUE:Name::string as EMPLOYEE_NAME
,CD.VALUE as CONTENTS
from T1
,table(flatten(v:company)) C
,table(flatten(VALUE:company_data)) CD
), MANAGER_LIST as
(
select COMPANY, EMPLOYEE_NAME from EMPLOYEE_LIST
where CONTENTS:Position::string ilike 'Manager'
)
select E.COMPANY
,E.EMPLOYEE_NAME
,M.EMPLOYEE_NAME as MANAGER
,CONTENTS
from EMPLOYEE_LIST E
left join MANAGER_LIST M on E.COMPANY = M.COMPANY
;
Output:
COMPANY | EMPLOYEE_NAME | MANAGER | CONTENTS |
---|---|---|---|
Company1 | JohnDoe | JohnDoe | company_data object here... |
Company1 | JaneDoe | JohnDoe | company_data object here... |
Company1 | JamesDoe | JohnDoe | company_data object here... |
Company2 | JohannaDoe | JohannaDoe | company_data object here... |
Company2 | FidoDoe | JohannaDoe | company_data object here... |
Company2 | BaileyDoe | JohannaDoe | company_data object here... |
Company2 | BaileyDoe | JohannaDoe | company_data object here... |
Company3 | AristotleDoe | BaconDoe | company_data object here... |
Company3 | SocratesDoe | BaconDoe | company_data object here... |
Company3 | BaconDoe | BaconDoe | company_data object here... |
Company3 | NietzcheDoe | BaconDoe | company_data object here... |
CodePudding user response:
Sometimes it makes sense to get out of SQL and into a JS/Python UDF when parsing documents.
This one would do it:
create or replace function find_manager(company_data variant)
returns string
language python
handler = 'x'
runtime_version = '3.8'
as $$
def x(company_data):
for dep in company_data:
if dep['Position'] == 'Manager':
return dep['Name']
In action:
with data as (
select $$
{
"company": [{
"company_data": [{
"Org": "Company1",
"Position": "Manager",
"Department": "Engineering",
"Name": "JohnDoe"
}, {
"Org": "Company1",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "JaneDoe"
}, {
"Org": "Company1",
"Position": "StaffEngineer",
"Department": "Engineering",
"Name": "JamesDoe"
}]
},
{
"company_data": [{
"Org": "Company2",
"Position": "Manager",
"Department": "Engineering",
"Name": "JohannaDoe"
}, {
"Org": "Company2",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "FidoDoe"
}, {
"Org": "Company2",
"Position": "StaffEngineer",
"Department": "Engineering",
"Name": "BaileyDoe"
}, {
"Org": "Company2",
"Position": "Intern",
"Department": "Fetching",
"Name": "BaileyDoe"
}]
}, {
"company_data": [{
"Org": "Company3",
"Position": "Custodian",
"Department": "Operations",
"Name": "AristotleDoe"
}, {
"Org": "Company3",
"Position": "SrEngineer",
"Department": "Engineering",
"Name": "SocratesDoe"
}, {
"Org": "Company3",
"Position": "Manager",
"Department": "Engineering",
"Name": "BaconDoe"
}, {
"Org": "Company3",
"Position": "Intern",
"Department": "Fetching",
"Name": "NietzcheDoe"
}]
}
]
}$$ doc
)
select x.value:company_data[0]:Org::string company_name
, find_manager(x.value:company_data)
from data, table(flatten(parse_json(doc):company)) x;