I'm trying to find the list of customers that own (or co-own) an account and the account number.
Our CLIENT table contains our client data, identified with a clientId (table also contains demographics and contact details).
Our ACCOUNTS table contains account information where a clientId can exist in 1 or 5 colums (one account can have up to 5 co-owners).
My code looks like this, but is agonisingly slow. Is this the correct approach? Are there other approaches that are less expensive?
Select c.*, aa.accountNo from
client as c, accounts as aa
where
exists
(select 1
from
accounts as a
where
Cast(a.Account_Date as Date) >= '2010-11-15' and
Cast(a.Account_Date as Date) <= '2017-04-24'
and c.clientId in (a.Owner1, a.Owner2, a.Owner3, a.Owner4, a.Owner5)
)
CodePudding user response:
The query you have explodes the data set and scans accounts twice. Also we shouldn't have to cast the column side to fit into a date range, there's now way that's beneficial (it might not always cause a scan but it's still not great). Try:
SELECT c.*, a.accountNo
FROM dbo.accounts AS a
CROSS APPLY
(
VALUES(Owner1),(Owner2),(Owner3),(Owner4),(Owner5)
) AS ac(clientId)
INNER JOIN dbo.client AS c
ON c.clientId = ac.clientId
WHERE a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
- Example db<>fiddle
CodePudding user response:
Your primary issue is lack of normalization. You should not have five Owner
columns. Instead you should have a separate table of AccountOwner
. Then you can simply join it.
This is effectively what you get from @AaronBetrand's answer, except that that one cannot be indexed as it is virtual.
Note also:
- There was no need to access
accounts
again in the subquery. - Always use explicit join syntax, not the implicit
,
syntax. - Never cast a column in order to filter or join it. Always cast the constants instead. In this case you don't even need to cast them.
SELECT
c.*,
a.accountNo
FROM Client AS c
JOIN AccountOwner AS ao ON ao.OwnerId = c.ClientId
JOIN Accounts AS a
ON a.AccountNo = ao.AccountNo
AND a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
For this query to work efficiently, you will need the following indexes
Account (Account_Date, AccountNo)
AccountOwner (AccountNo, OwnerId)
Client (ClientId) INCLUDE (OtherColumns)
An alternative set of indexes may prove a better access strategy (you need to test)
Account (AccountNo) INCLUDE (Account_Date)
AccountOwner (OwnerId, AccountNo)
Client (ClientId) INCLUDE (OtherColumns)