Home > Enterprise >  How to remove garbage data from a temp table based on several criteria
How to remove garbage data from a temp table based on several criteria

Time:11-03

Sql Server version: Sql Server 2019 - 15.04138.2 Here is the script to generate the data in SQL Server:

CREATE TABLE #data 
(
    Device varchar(100), 
    Hall INT, 
    EquipNo INT, 
    LocNo INT, 
    HitCount INT, 
    Operator VARCHAR(100)
)

INSERT INTO #data
    VALUES 
          ('Tiger', 0, 0, 0, 0, null)
        , ('Tiger', 1, 0, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 3, NULL)
        , ('Tiger', 1, 5, 10, 3, 'Sam')  

        , ('Shark', 0, 0, 0, 0, null)
        , ('Shark', 2, 3, 0, 0, null) 
        , ('Shark', 2, 3, null, 5, null) 
        , ('Shark', 2, 3, 20, 2, null)
        , ('Shark', 2, 3, 20, 2, 'Alex')

        , ('Tiger', 0, 0, 0, 0, null)
        , ('Tiger', 1, 3, 0, 0, null) 
        , ('Tiger', 1, null, null, 5, null)
        , ('Tiger', 1, 3, 20, 10, 'Sam') 
        , ('Tiger', 1, 3, 20, 2, 'Sam')

In the above data, a record is valid if it has values in Device, Hall, EquipNo and HitCount columns, other than zero or empty string. The data can be grouped logically by Device, Hall and EquipNo. If 2 records in a group have "Device, hall, EquipNo and HitCount" data then we need to select the record with the highest value for HitCount. But if the hitcount is the same then we should take the record with the most information.

The desired result is (order is not important):

Device Hall EquipNo LocNo HitCount Operator
Tiger 1 5 10 3 Sam
Shark 2 3 Null 5 Null
Tiger 1 3 20 10 Sam

As an additional clarification, please see the following image to see what data we should end up with:
records to be selected

Using another temporary table or more is ok to end up with the desired result.



UPDATE: Here is the updated script to create a temp table with test data and applying solution provided by Hogan:

CREATE TABLE #data 
(
    Device varchar(100), 
    Hall INT, 
    EquipNo INT, 
    LocNo INT, 
    HitCount INT, 
    Operator VARCHAR(100)
)

INSERT INTO #data
    VALUES 
          ('Tiger', 0, 0, 0, 0, null)
        , ('Tiger', 1, 0, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 0, NULL)
        , ('Tiger', 1, 5, 10, 3, NULL)
        , ('Tiger', 1, 5, 10, 3, 'Sam')  

        , ('Shark', 0, 0, 0, 0, null)
        , ('Shark', 2, 3, 0, 0, null) 
        , ('Shark', 2, 3, null, 5, null) 
        , ('Shark', 2, 3, 20, 2, null)
        , ('Shark', 2, 3, 20, 2, 'Alex')

        , ('Tiger', 0, 0, 0, 0, null)
        , ('Tiger', 1, 3, 0, 0, null) 
        , ('Tiger', 1, null, null, 5, null)
        , ('Tiger', 1, 3, 20, 10, 'Sam') 
        , ('Tiger', 1, 3, 20, 2, 'Sam')


SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
  SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator,
          ROW_NUMBER() OVER(
            PARTITION BY Device, Hall, EquipNo
            ORDER BY HitCount DESC, 
                     (CASE WHEN EquipNo  IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN LocNo    IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
          ) as RN
  FROM #data
) S
WHERE S.RN = 1

However, the result of running the query is:

Device Hall EquipNo LocNo HitCount Operator
Shark 0 0 0 0 null
Shark 2 3 null 5 null
Tiger 0 0 0 0 null
Tiger 1 null null 5 null
Tiger 1 0 10 0 null
Tiger 1 3 20 10 Sam
Tiger 1 5 10 3 Sam

Device must have a value that is not empty string. Hall and EquipNo also must have a value that is not zero.

So the desired result should be (order is not important):

Device Hall EquipNo LocNo HitCount Operator
Tiger 1 5 10 3 Sam
Shark 2 3 Null 5 Null
Tiger 1 3 20 10 Sam

Thx.

CodePudding user response:

You can use windowing functions to solve this

SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
  SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator,
          ROW_NUMBER() OVER(
            PARTITION BY Device, Hall, EquipNo
            ORDER BY HitCount DESC, 
                     (CASE WHEN EquipNo  IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN LocNo    IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
          ) as RN,
  FROM #temp
) S
WHERE S.RN = 1

As you can see we partition by Device, Hall, EquipNo as you requested. Then to pick the item we use the order by -- we first order with the largest HitCount being first and next by the sum of non-null values being the largest. This gives the results you asked for when we only select the first item (those with a RN of 1) from the sub-query.


Update based on comments

SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
  SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator,
          ROW_NUMBER() OVER(
            PARTITION BY Device
            ORDER BY HitCount DESC, 
                     (CASE WHEN EquipNo  IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN LocNo    IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
          ) as RN,
  FROM #temp
) S
WHERE S.RN = 1

CodePudding user response:

I'm answering my own question but I'm marking Hogan's answer as the correct one because I couldn't edit his answer and I got the solution through him and Maybe I wasn't able to clarify my question well. What was remaining is to ensure that Device, Hall and EquipNo have valid values which are Non-Null and non-zero numbers and non-empty strings. Here is the query I was looking for:

SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
  SELECT  Device, Hall, EquipNo, LocNo, HitCount, Operator,
          ROW_NUMBER() OVER(
            PARTITION BY Device, Hall, EquipNo
            ORDER BY HitCount DESC, 
                     (CASE WHEN EquipNo  IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN LocNo    IS NOT NULL THEN 1 ELSE 0 END  
                      CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
          ) as RN
  FROM #data
) S
WHERE S.RN = 1
    and Device <> '' AND Device IS NOT NULL
    and Hall <> 0 AND Hall IS NOT NULL
    and EquipNo <> 0 AND EquipNo IS NOT NULL

Thanks Hogan.

  • Related