I have a table that I want to monitor for differences in counts based on a group of records.
In the table below records are grouped by the Source name (Customer, Product, Service), I want to check when the Total_Count of the ‘staging_’ and ‘delta_’ Entity columns is different.
For example for the Source - Customer, the Total_Count for the Entity staging_Customer and delta_Customer is the same so the output should return the difference 755 - 755 = 0, the same output should be there for the Source - Service 340 - 340 = 0.
However, for the Source - Product the staging_Product Total_Count and the delta_Product Total_Count are not the same so the output query should return the difference, so 240 - 0 = 240.
Each source always has 4 Entity records and the naming convention is always the same (hdp_[Source]sql, staging[Source], delta_[Source], final_[Source]).
Run_Date | Source | Process | Entity | Total_Count |
---|---|---|---|---|
20180101 | Customer | tr_Customer_Data | hdp_Customer_sql | 1500 |
20180101 | Customer | tr_Customer_Data | staging_Customer | 755 |
20180101 | Customer | tr_Customer_Data | delta_Customer | 755 |
20180101 | Customer | tr_Customer_Data | final_Customer | 755 |
20180101 | Product | tr_Product_Data | hdp_Product_sql | 570 |
20180101 | Product | tr_Product_Data | staging_Product | 240 |
20180101 | Product | tr_Product_Data | delta_Product | 0 |
20180101 | Product | tr_Product_Data | final_Product | 0 |
20180101 | Service | tr_Service_Data | hdp_Service_sql | 2300 |
20180101 | Service | tr_Service_Data | staging_Service | 340 |
20180101 | Service | tr_Service_Data | delta_Service | 340 |
20180101 | Service | tr_Service_Data | final_Service | 340 |
Expected output:
Run_Date | Source | Differences |
---|---|---|
20180101 | Customer | 0 |
20180101 | Customer | 240 |
20180101 | Customer | 0 |
CodePudding user response:
I filtered out unnecessary information using a where
clause, and then I used lag
to compare total_count of the entities in question.
select Run_Date
,Source
,Differences
from
(select Run_Date
,Source
,abs(Total_Count-lag(Total_Count) over(partition by source order by Entity)) as Differences
,row_number() over(partition by source order by Entity desc) as rn
from t
where Entity like '%staging%' or
Entity like '