Platform: Snowflake
I am building a report that requires the most up to date department information. Some departments have been updated and take on a new timestamp date showing they are more up to date. I've only been dabbling in SQL for a year, and so far I've had a great run with most other queries I use to automate dashboards, but this one is stumping me even after googling etc.
I tried writing this:
select MAX(EFFDT), DEPTID, EFF_STATUS, DESCR
from "DATA_LAKE"."PS_FS"."PS_DEPT_TBL";
I get this error:
SQL compilation error: [PS_DEPT_TBL.DEPTID] is not a valid group by expression
Perhaps a partition is needed? If so, how would I structure it?
Example:
Row Dept EFFDT
Row 1 DEPT78 1990-01-01 00:00:00.0000
Row 2 DEPT78 1990-01-02 00:00:00.0000 < I want this record/row to populate and not row 1 above.
Row 3 DEPT79 1990-01-01 00:00:00.0000
Row 4 DEPT79 1990-01-02 00:00:00.0000 < I want this record/row to populate and not row 3 above.
Any help is greatly appreciated.
I tried researching different ways to pull most up to date data but failed as I believe I need a more dense query to get desired results.
CodePudding user response:
The sample data does not line up with the SQL but it appears that you could group by the columns other than EFFDT:
select MAX(EFFDT), DEPTID, EFF_STATUS, DESCR
from "DATA_LAKE"."PS_FS"."PS_DEPT_TBL"
group by DEPTID, EFF_STATUS, DESCR
;
If that does not work it's because it's grouping on too many columns. If you just need to group by "DEPTID" for example but need the other columns:
select MAX(EFFDT), DEPTID, max(EFF_STATUS) as EFF_STATUS, max(DESCR) as DESCR
from "DATA_LAKE"."PS_FS"."PS_DEPT_TBL"
group by DEPTID
;
Someone may ask why use max
when the max has no significance and there's an any_value
function that would get the same result. The reason is that the any_value is currently marked as ineligible to use the result set cache. Using max or min will allow the result set cache to be used if the query is eligible for it.