Sql Server version: Sql Server 2019 - 15.04138.2 Here is the script to generate the data in SQL Server:
CREATE TABLE #data
(
Device varchar(100),
Hall INT,
EquipNo INT,
LocNo INT,
HitCount INT,
Operator VARCHAR(100)
)
INSERT INTO #data
VALUES
('Tiger', 0, 0, 0, 0, null)
, ('Tiger', 1, 0, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 3, NULL)
, ('Tiger', 1, 5, 10, 3, 'Sam')
, ('Shark', 0, 0, 0, 0, null)
, ('Shark', 2, 3, 0, 0, null)
, ('Shark', 2, 3, null, 5, null)
, ('Shark', 2, 3, 20, 2, null)
, ('Shark', 2, 3, 20, 2, 'Alex')
, ('Tiger', 0, 0, 0, 0, null)
, ('Tiger', 1, 3, 0, 0, null)
, ('Tiger', 1, null, null, 5, null)
, ('Tiger', 1, 3, 20, 10, 'Sam')
, ('Tiger', 1, 3, 20, 2, 'Sam')
In the above data, a record is valid if it has values in Device, Hall, EquipNo and HitCount columns, other than zero or empty string. The data can be grouped logically by Device, Hall and EquipNo. If 2 records in a group have "Device, hall, EquipNo and HitCount" data then we need to select the record with the highest value for HitCount. But if the hitcount is the same then we should take the record with the most information.
The desired result is (order is not important):
Device | Hall | EquipNo | LocNo | HitCount | Operator |
---|---|---|---|---|---|
Tiger | 1 | 5 | 10 | 3 | Sam |
Shark | 2 | 3 | Null | 5 | Null |
Tiger | 1 | 3 | 20 | 10 | Sam |
As an additional clarification, please see the following image to see what data we should end up with:
records to be selected
Using another temporary table or more is ok to end up with the desired result.
UPDATE: Here is the updated script to create a temp table with test data and applying solution provided by Hogan:
CREATE TABLE #data
(
Device varchar(100),
Hall INT,
EquipNo INT,
LocNo INT,
HitCount INT,
Operator VARCHAR(100)
)
INSERT INTO #data
VALUES
('Tiger', 0, 0, 0, 0, null)
, ('Tiger', 1, 0, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 0, NULL)
, ('Tiger', 1, 5, 10, 3, NULL)
, ('Tiger', 1, 5, 10, 3, 'Sam')
, ('Shark', 0, 0, 0, 0, null)
, ('Shark', 2, 3, 0, 0, null)
, ('Shark', 2, 3, null, 5, null)
, ('Shark', 2, 3, 20, 2, null)
, ('Shark', 2, 3, 20, 2, 'Alex')
, ('Tiger', 0, 0, 0, 0, null)
, ('Tiger', 1, 3, 0, 0, null)
, ('Tiger', 1, null, null, 5, null)
, ('Tiger', 1, 3, 20, 10, 'Sam')
, ('Tiger', 1, 3, 20, 2, 'Sam')
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator,
ROW_NUMBER() OVER(
PARTITION BY Device, Hall, EquipNo
ORDER BY HitCount DESC,
(CASE WHEN EquipNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN LocNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
) as RN
FROM #data
) S
WHERE S.RN = 1
However, the result of running the query is:
Device | Hall | EquipNo | LocNo | HitCount | Operator |
---|---|---|---|---|---|
Shark | 0 | 0 | 0 | 0 | null |
Shark | 2 | 3 | null | 5 | null |
Tiger | 0 | 0 | 0 | 0 | null |
Tiger | 1 | null | null | 5 | null |
Tiger | 1 | 0 | 10 | 0 | null |
Tiger | 1 | 3 | 20 | 10 | Sam |
Tiger | 1 | 5 | 10 | 3 | Sam |
Device must have a value that is not empty string. Hall and EquipNo also must have a value that is not zero.
So the desired result should be (order is not important):
Device | Hall | EquipNo | LocNo | HitCount | Operator |
---|---|---|---|---|---|
Tiger | 1 | 5 | 10 | 3 | Sam |
Shark | 2 | 3 | Null | 5 | Null |
Tiger | 1 | 3 | 20 | 10 | Sam |
Thx.
CodePudding user response:
You can use windowing functions to solve this
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator,
ROW_NUMBER() OVER(
PARTITION BY Device, Hall, EquipNo
ORDER BY HitCount DESC,
(CASE WHEN EquipNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN LocNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
) as RN,
FROM #temp
) S
WHERE S.RN = 1
As you can see we partition by Device, Hall, EquipNo as you requested. Then to pick the item we use the order by -- we first order with the largest HitCount being first and next by the sum of non-null values being the largest. This gives the results you asked for when we only select the first item (those with a RN of 1) from the sub-query.
Update based on comments
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator,
ROW_NUMBER() OVER(
PARTITION BY Device
ORDER BY HitCount DESC,
(CASE WHEN EquipNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN LocNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
) as RN,
FROM #temp
) S
WHERE S.RN = 1
CodePudding user response:
I'm answering my own question but I'm marking Hogan's answer as the correct one because I couldn't edit his answer and I got the solution through him and Maybe I wasn't able to clarify my question well. What was remaining is to ensure that Device, Hall and EquipNo have valid values which are Non-Null and non-zero numbers and non-empty strings. Here is the query I was looking for:
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator
FROM (
SELECT Device, Hall, EquipNo, LocNo, HitCount, Operator,
ROW_NUMBER() OVER(
PARTITION BY Device, Hall, EquipNo
ORDER BY HitCount DESC,
(CASE WHEN EquipNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN LocNo IS NOT NULL THEN 1 ELSE 0 END
CASE WHEN Operator IS NOT NULL THEN 1 ELSE 0 END) DESC
) as RN
FROM #data
) S
WHERE S.RN = 1
and Device <> '' AND Device IS NOT NULL
and Hall <> 0 AND Hall IS NOT NULL
and EquipNo <> 0 AND EquipNo IS NOT NULL
Thanks Hogan.