I'm sort of a beginner to SQL Server.. and I want to get results for when there is more than one SYS_ETL_SOURCE for a given LOCAL_SCHOOL_YEAR for a DISTRICT_CODE.
SELECT DISTRICT_CODE, LOCAL_SCHOOL_YEAR, SYS_ETL_SOURCE
FROM K12INTEL_DW.FTBL_ENROLLMENTS
GROUP BY DISTRICT_CODE, LOCAL_SCHOOL_YEAR, SYS_ETL_SOURCE
ORDER BY 1, 2
The above query gets me all data for each school year and each sys_etl_source, but I would like to only see those which have more than one SYS_ETL_SOURCE for a given year.
Edit:
This is what I am looking for...
district_code | local_school_year | sys_etl_source |
---|---|---|
1234 | 2020-2021 | ETL_SOURCE 1 |
1234 | 2020-2021 | ETL_SOURCE 2 |
This is what I currently get
district_code | local_school_year | sys_etl_source |
---|---|---|
1234 | 2018-2019 | ETL_SOURCE 1 |
1234 | 2019-2020 | ETL_SOURCE 1 |
1234 | 2020-2021 | ETL_SOURCE 1 |
1234 | 2020-2021 | ETL_SOURCE 2 |
1234 | 2021-2022 | ETL_SOURCE 2 |
Edit 2:
This is what FTBL_ENROLLMENTS looks like...
enrollments_key | student_key | start_date | end_date | district_code | local_school_year | sys_etl_source |
---|---|---|---|---|---|---|
xxxxx | xxxxxxxxx | 1/01/2020 | 6/31/21 | 2021 | 2020-2021 | ETL_SOURCE |
We have some school years that have different sys_etl_sources and because of that there are some duplicate records.
CodePudding user response:
You can get this using a correlated subquery (I added an alias for your initial table to be E1 for simplicity/clarity):
WHERE EXISTS (SELECT DISTRICT_CODE, LOCAL_SCHOOL_YEAR, COUNT() FROM K12INTEL_DW.FTBL_ENROLLMENTS E2 WHERE E1.DISTRICT_CODE = E2.DISTRICT_CODE and E1.LOCAL_SCHOOL_YEAR = E2.LOCAL_SCHOOL_YEAR GROUP BY DISTRICT_CODE, LOCAL_SCHOOL_YEAR HAVING COUNT() > 1)
CodePudding user response:
A window function is probably the most efficient
SELECT
e.DISTRICT_CODE,
e.LOCAL_SCHOOL_YEAR,
e.SYS_ETL_SOURCE
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY e.DISTRICT_CODE, e.LOCAL_SCHOOL_YEAR) AS count
FROM K12INTEL_DW.FTBL_ENROLLMENTS e
) e
WHERE e.count > 1;