I have data of the following format for which I am trying to preform comparison using SQL:
The data are ratings of "groups" with up to 4 values rated per group:
tblRatings
RatingID | GroupID | Value1 | Value2 | Value3 | Value4 |
---|---|---|---|---|---|
1 | 2222 | 13 | 19 | (null) | (null) |
2 | 2222 | 13 | (null) | (null) | (null) |
3 | 2223 | 1 | (null) | (null) | (null) |
4 | 2223 | 1 | (null) | (null) | (null) |
5 | 2224 | 5 | (null) | (null) | (null) |
6 | 2225 | 10 | 12 | 13 | (null) |
7 | 2225 | 12 | 13 | 10 | (null) |
My goal is to compare the records and determine which GroupIDs have two ratings that match, with a match defined as the same list of values in any order, with all nulls to be ignored. Thus in the example data, GroupIDs 2223 and 2225 have a match, and the others do not.
How would you proceed to perform this comparison?
As a first step, I have used a union query to normalize the data to one value per row, as follows:
qryRatingsNormalized
RatingID | GroupID | Value |
---|---|---|
1 | 2222 | 13 |
1 | 2222 | 19 |
2 | 2222 | 13 |
3 | 2223 | 1 |
4 | 2223 | 1 |
5 | 2224 | 5 |
6 | 2225 | 10 |
6 | 2225 | 12 |
6 | 2225 | 13 |
7 | 2225 | 12 |
7 | 2225 | 13 |
7 | 2225 | 10 |
However I'm unsure how to proceed from there.
FYI, I am working in MS Access using tables linked in SQL Server.
CodePudding user response:
If you are using SQL-server you can try to use CROSS APPLY
SELECT t1.RatingID,
t1.GroupID,
v.value
FROM T t1 CROSS APPLY (
VALUES
(Value1),
(Value2),
(Value3),
(Value4)
) v (value)
WHERE v.value IS NOT NULL
Or use UNION ALL
SELECT t1.*
FROM (
SELECT RatingID,GroupID,Value1 value
FROM T
UNION ALL
SELECT RatingID,GroupID,Value2
FROM T
UNION ALL
SELECT RatingID,GroupID,Value3
FROM T
UNION ALL
SELECT RatingID,GroupID,Value4
FROM T
) t1
WHERE value IS NOT NULL
CodePudding user response:
You could try to set up a pass-through query with tsql to get around the Access limitations:
with r as (
select GroupID, RatingID, val,
row_number() over (partition by RatingID order by val) as rn,
count(*) over (partition by RatingID) as cnt
from tblRatings cross apply
(values (Value1), (Value2), (Value3), (Value4)) as v (val)
where val is not null
)
select r1.RatingID, r2.RatingID, min(r1.GroupID) as GroupID
from r r1 left outer join r r2 on
r2.GroupID = r1.GroupID and r2.RatingID > r1.RatingID
and r2.cnt = r1.cnt
and r2.rn = r1.rn and r2.val = r1.val
group by r1.RatingID, r2.RatingID
having count(r2.val) = count(*) and count(*) = min(r2.cnt);
See working example here:
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=ab79c9bcafbfcad2efbfdeb5dc102420
CodePudding user response:
I haven't really kept up with the state of Access but I'm thinking something like this should be valid. You may be able to get away with matching on a set of aggregate values that stand in for the individual values.
select r1.RatingID, r2.RatingID, r1.GroupID
from
(
select
GroupID, RatingID,
count(Value) cnt, sum(Value) tot,
avg(Value) avg, min(Value) as lst, max(Value) as grt,
floor(sum(log(Value)) * 100000) as lg/* assumes no zeroes. a positive offset would likely fix that problem too */
from qryRatingsNormalized
group by GroupID, RatingID
) r1
inner join
(
select
GroupID, RatingID,
count(Value) cnt, sum(Value) tot,
avg(Value) avg, min(Value) as lst, max(Value) as grt,
floor(sum(log(Value)) * 100000) as lg
from qryRatingsNormalized
group by GroupID, RatingID
) r2 on r2.GroupID = r1.GroupID and r2.RatingID > r1.RatingID
and r2.cnt = r1.cnt and r2.lst = r1.lst and r2.grt = r1.grt
and r2.tot = r1.tot and r2.avg = r1.avg and r2.lg = r1.lg
I've repeated the same subquery twice so perhaps you'd need to just define that as a named Query/View to get this to work without the named subquery.
It's probably advantageous that your maximum is four values. Some of this probably hinges on whether you need to do this repeatedly, what the values are, how many groups you've got, whether zero is valid, whether negative numbers are valid...
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=ab79c9bcafbfcad2efbfdeb5dc102420