I have performed a bunch of analysis on some client data and want to present the duplicates that I found across their systems.
I match on based on 4 separate criteria, Name, Address (multiple addresses and lines) and Phone (every phone number associated with the account) and Email (every email associated with the account).
I boil it all down into a single table.
CREATE TABLE [dbo].[dupe_Customers](
[id] [int] IDENTITY(1,1) NOT NULL,
[MatchOn] [varchar](255) NULL,
[DataID] [varchar](255) NULL,
[CustName] [varchar](255) NULL,
[Address] [varchar](255) NULL,
[Phone] [varchar](255) NULL,
[Fax] [varchar](255) NULL,
[Email] [varchar](255) NULL
)
When I present it, I want view it in a specific order and ordering within each dataset.
SELECT * FROM [dupe_Customers] WHERE [MatchOn] = 'Name' ORDER BY [CustName]
SELECT * FROM [dupe_Customers] WHERE [MatchOn] = 'Address' ORDER BY [Address]
SELECT * FROM [dupe_Customers] WHERE [MatchOn] = 'Phone' ORDER BY [Phone]
SELECT * FROM [dupe_Customers] WHERE [MatchOn] = 'Email' ORDER BY [Email]
If I use a UNION
I of course can't use the ORDER BY
and have to pick a single ordering for the entire dataset... Which defeats the purpose since I can just do that on the table as a whole already.
I know I can just output 4 diff times, but the goal is to have a single output so it can be loaded on a single tab in Excel. (If you know how to output multiple datasets to a single tab in Excel I'd love to know how to do that!)
The only thing I can think of is to put each into a new table with an id, something like:
SELECT identity (int,1,1) as [id], * INTO newTable FROM [dupe_Customers] WHERE [MatchOn] = 'Name' ORDER BY [CustName]
INSERT INTO newTable
SELECT * FROM [dupe_Customers] WHERE [MatchOn] = 'Address' ORDER BY [Address]
Then I'd just do a SELECT * FROM newTable ORDER BY [id]
In addition to feeling "hacky" I am not 100% confident that the data is actually inserted into the new table in the correct order.
This is about a midsize data set with about 500k rows. I am using Azure and it is 3-5x slower than when I was doing things on my local machine, so I'd like to avoid something too processor intensive that's going to take hours to run.
CodePudding user response:
You may order using a CASE
expression:
SELECT *
FROM [dupe_Customers]
WHERE [MatchOn] IN ('Name', 'Address', 'Phone', 'Email')
ORDER BY [MatchOn],
CASE [MatchOn]
WHEN 'Name' THEN CustName
WHEN 'Address' THEN Address
WHEN 'Phone' THEN Phone
WHEN 'Email' THEN Email END;