I have 2 Location columns. I need to find a way to show only exceptions, where Location1 != Location2, BUT there are multiple values in Location2 column that can be associated with value in Lcoation1.
For example: San Diego in Location1 column can have 4 possible values in Location2 column (North, Central, South, San Diego) and such rows need to be excluded from the run. If San Diego in Location1 column and Seattle (for example) in Location2 column, that is considered as Exception.
create table Locations
(Id INT, Location1 VARCHAR(30), Location2 VARCHAR(30));
INSERT INTO Locations
VALUES
(1, 'San Diego', 'North'),
(2, 'San Diego', 'South'),
(3, 'San Diego', 'Central'),
(4, 'San Diego', 'Arizona'),
(5, 'San Diego', 'San Diego'),
(6, 'San Diego', 'Seattle'),
(7, 'San Diego', 'North'),
(8, 'San Diego', 'San Diego'),
(9, 'San Diego', 'Central'),
(10, 'San Diego', 'South')
SELECT * FROM Locations
I can exclude only obvious condition where Location 1 = Location 2:
SELECT *
FROM Locations
WHERE Location1 != Location2
Expected Output:
Id | Lcoation1 | Location2 |
---|---|---|
4 | San Diego | Arizona |
6 | San Diego | Seattle |
CodePudding user response:
WITH TIES
in concert with the window function sum() over()
could be an option here
Example
SELECT Top 1 with ties *
FROM Locations
Order By SUM(1) OVER (PARTITION BY LOCATION2)
Results
Id Location1 Location2
4 San Diego Arizona
6 San Diego Seattle
Note: The PARTITION BY
could include LOCATION1
... It is hard to tell with your sample data:
... SUM(1) OVER (PARTITION BY LOCATION1,LOCATION2)
CodePudding user response:
If you are willing to do some manual (ongoing) maintenance, you could create a mapping table of valid Location1
-Location2
relations and filter Locations
based on the mapping table. This way you define what is valid and you can easily find things that aren't.
DROP TABLE IF EXISTS locations;
CREATE TABLE Locations
(Id INT, Location1 VARCHAR(30), Location2 VARCHAR(30));
DROP TABLE IF EXISTS RegionMap;
CREATE TABLE RegionMap (Location1 VARCHAR(30), Location2 VARCHAR(30));
INSERT dbo.RegionMap (Location1, Location2)
VALUES ('San Diego', 'North'),
('San Diego', 'South'),
('San Diego', 'Central'),
('San Diego', 'San Diego');
INSERT INTO Locations
VALUES
(1, 'San Diego', 'North'),
(2, 'San Diego', 'South'),
(3, 'San Diego', 'Central'),
(4, 'San Diego', 'Arizona'),
(5, 'San Diego', 'San Diego'),
(6, 'San Diego', 'Seattle'),
(7, 'San Diego', 'North'),
(8, 'San Diego', 'San Diego'),
(9, 'San Diego', 'Central');
SELECT l.Location1, l.Location2
FROM locations l
LEFT JOIN dbo.RegionMap m ON m.Location1 = l.Location1
AND m.Location2 = l.Location2
WHERE m.Location2 IS NULL;
Note here that I removed ID 10 (the 2nd instance of San Diego South) and you still get the proper results.
The drawback here is that you will have to put a process in place to maintain the mapping table. If you run this query routinely and get a regular report of the output it'll help you to know what you might need to add to the mapping table.
But really, the best thing is to limit what the users can enter in as masterdata in the first place...