so I have a statement I believe should work... However it feels pretty suboptimal and I can't for the life of me figure out how to optimise it.
I have the following tables:
- Transactions
- [Id] is
PRIMARY KEY IDENTITY
- [Hash] has a
UNIQUE
constraint - [BlockNumber] has an Index
- [Id] is
- Transfers
- [Id] is
PRIMARY KEY IDENTITY
- [TransactionId] is a Foreign Key referencing [Transactions].[Id]
- [Id] is
- TokenPrices
- [Id] is
PRIMARY KEY IDENTITY
- [Id] is
- TokenPriceAttempts
- [Id] is
PRIMARY KEY IDENTITY
- [TransferId] is a Foreign Key referencing [Transfers].[Id]
- [Id] is
What I want to do, is select all the transfers, with a few bits of data from their related transaction (one transaction to many transfers), where I don't currently have a price stored in TokenPrices related to that transfer.
In the first part of the query, I am getting a list of all the transfers, and calculating the DIFF between the nearest found token price. If one isn't found, this is null (what I ultimately want to select). I am allowing 3 hours either side of the transaction timestamp - if nothing is found within that timespan, it will be null.
Secondly, I am selecting from this set, ensuring first that diff is null as this means the price is missing, and finally that token price attempts either doesn't have an entry for attempting to get a price, or if it does than it has fewer than 5 attempts listed and the last attempt was more than a week ago.
The way I have laid this out results in essentially 3 of the same / similar SELECT statements within the WHERE clause, which feels hugely suboptimal...
How could I improve this approach?
WITH [transferDateDiff] AS
(
SELECT
[t1].[Id],
[t1].[TransactionId],
[t1].[From],
[t1].[To],
[t1].[Value],
[t1].[Type],
[t1].[ContractAddress],
[t1].[TokenId],
[t2].[Hash],
[t2].[Timestamp],
ABS(DATEDIFF(SECOND, [tp].[Timestamp], [t2].[Timestamp])) AS diff
FROM
[dbo].[Transfers] AS [t1]
LEFT JOIN
[dbo].[Transactions] AS [t2]
ON [t1].[TransactionId] = [t2].[Id]
LEFT JOIN
(
SELECT
*
FROM
[dbo].[TokenPrices]
)
AS [tp]
ON [tp].[ContractAddress] = [t1].[ContractAddress]
AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp])
AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp])
WHERE
[t1].[Type] < 2
)
SELECT
[tdd].[Id],
[tdd].[TransactionId],
[tdd].[From],
[tdd].[To],
[tdd].[Value],
[tdd].[Type],
[tdd].[ContractAddress],
[tdd].[TokenId],
[tdd].[Hash],
[tdd].[Timestamp]
FROM
[transferDateDiff] AS tdd
WHERE
[tdd].[diff] IS NULL AND
(
(
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
)
= 0 OR
(
(
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
)
< 5 AND
(
DATEDIFF(DAY,
(
SELECT
MAX([tpa].[Created])
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
),
CURRENT_TIMESTAMP
) >= 7
)
)
)
CodePudding user response:
Here is an attempt to help simplify. I stripped out all the [brackets] that really are not required unless you are running into something like a reserved keyword, or columns with spaces in their name (bad to begin with).
Anyhow, your main query had 3 instances of a select per ID. To eliminate that, I did a LEFT JOIN to a subquery that pulls all transfers of type < 2 AND JOINS to the price attempts ONCE. This way, the result will have already pre-aggregated the count(*) and Max(Created) done ONCE for the same basis of transfers in question with your WITH CTE declaration. So you dont have to keep running the 3 queries each time, and you dont have to query the entire table of ALL transfers, just those with same underlying type < 2 condition. The result subquery alias "PQ" (preQuery)
This now simplifies the readability of the outer WHERE clause from the redundant counts per Id.
WITH transferDateDiff AS
(
SELECT
t1.Id,
t1.TransactionId,
t1.From,
t1.To,
t1.Value,
t1.Type,
t1.ContractAddress,
t1.TokenId,
t2.Hash,
t2.Timestamp,
ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) AS diff
FROM
dbo.Transfers t1
LEFT JOIN dbo.Transactions t2
ON t1.TransactionId = t2.Id
LEFT JOIN dbo.TokenPrices tp
ON t1.ContractAddress = tp.ContractAddress
AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
WHERE
t1.Type < 2
)
SELECT
tdd.Id,
tdd.TransactionId,
tdd.From,
tdd.To,
tdd.Value,
tdd.Type,
tdd.ContractAddress,
tdd.TokenId,
tdd.Hash,
tdd.Timestamp
FROM
transferDateDiff tdd
LEFT JOIN
( SELECT
t1.Id,
COUNT(*) Attempts,
MAX(tpa.Created) MaxCreated
FROM
dbo.Transfers t1
JOIN dbo.TokenPriceAttempts tpa
on t1.Id = tpa.TransferId
WHERE
t1.Type < 2
GROUP BY
t1.Id ) PQ
on tdd.Id = PQ.Id
WHERE
tdd.diff IS NULL
AND ( PQ.Attempts IS NULL
OR PQ.Attempts = 0
OR ( PQ.Attempts < 5
AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7
)
)
CodePudding user response:
I don't understand why you are doing this:
LEFT JOIN
(
SELECT
*
FROM
[dbo].[TokenPrices]
)
AS [tp]
ON [tp].[ContractAddress] = [t1].[ContractAddress]
AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp])
AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp])
Isn't this a
LEFT JOIN [dbo].[TokenPrices] as TP ...
This:
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
could be another CTE instead of being a sub... In fact any of your sub queries could be CTE's, that's part of a CTE is making things easier to read.
,TPA
AS
(
SELECT COUNT(*)
FROM [dbo].[TokenPriceAttempts] tpa
WHERE [tpa].[TransferId] = [tdd].[Id]
)