Home > Back-end >  Simplifying SELECT statement
Simplifying SELECT statement

Time:03-31

so I have a statement I believe should work... However it feels pretty suboptimal and I can't for the life of me figure out how to optimise it.

I have the following tables:

  • Transactions
    • [Id] is PRIMARY KEY IDENTITY
    • [Hash] has a UNIQUE constraint
    • [BlockNumber] has an Index
  • Transfers
    • [Id] is PRIMARY KEY IDENTITY
    • [TransactionId] is a Foreign Key referencing [Transactions].[Id]
  • TokenPrices
    • [Id] is PRIMARY KEY IDENTITY
  • TokenPriceAttempts
    • [Id] is PRIMARY KEY IDENTITY
    • [TransferId] is a Foreign Key referencing [Transfers].[Id]

What I want to do, is select all the transfers, with a few bits of data from their related transaction (one transaction to many transfers), where I don't currently have a price stored in TokenPrices related to that transfer.

In the first part of the query, I am getting a list of all the transfers, and calculating the DIFF between the nearest found token price. If one isn't found, this is null (what I ultimately want to select). I am allowing 3 hours either side of the transaction timestamp - if nothing is found within that timespan, it will be null.

Secondly, I am selecting from this set, ensuring first that diff is null as this means the price is missing, and finally that token price attempts either doesn't have an entry for attempting to get a price, or if it does than it has fewer than 5 attempts listed and the last attempt was more than a week ago.

The way I have laid this out results in essentially 3 of the same / similar SELECT statements within the WHERE clause, which feels hugely suboptimal...

How could I improve this approach?

        WITH [transferDateDiff] AS 
        (
            SELECT
                [t1].[Id],
                [t1].[TransactionId],
                [t1].[From],
                [t1].[To],
                [t1].[Value],
                [t1].[Type],
                [t1].[ContractAddress],
                [t1].[TokenId],
                [t2].[Hash],
                [t2].[Timestamp],
                ABS(DATEDIFF(SECOND, [tp].[Timestamp], [t2].[Timestamp])) AS diff 
            FROM
                [dbo].[Transfers] AS [t1]
                LEFT JOIN
                    [dbo].[Transactions] AS [t2] 
                    ON [t1].[TransactionId] = [t2].[Id]
                LEFT JOIN
                    (
                        SELECT
                            * 
                        FROM
                            [dbo].[TokenPrices]
                    )
                    AS [tp] 
                    ON [tp].[ContractAddress] = [t1].[ContractAddress]
                    AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp]) 
                    AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp]) 
            WHERE
                [t1].[Type] < 2
        )


        SELECT
            [tdd].[Id],
            [tdd].[TransactionId],
            [tdd].[From],
            [tdd].[To],
            [tdd].[Value],
            [tdd].[Type],
            [tdd].[ContractAddress],
            [tdd].[TokenId],
            [tdd].[Hash],
            [tdd].[Timestamp]
        FROM
            [transferDateDiff] AS tdd
        WHERE
            [tdd].[diff] IS NULL AND
            (
                (
                    SELECT
                        COUNT(*)
                    FROM
                        [dbo].[TokenPriceAttempts] tpa
                    WHERE
                        [tpa].[TransferId] = [tdd].[Id]
                )
                = 0 OR
                (
                    (
                        SELECT
                            COUNT(*)
                        FROM
                            [dbo].[TokenPriceAttempts] tpa
                        WHERE
                            [tpa].[TransferId] = [tdd].[Id]
                    )
                    < 5 AND
                    (
                        DATEDIFF(DAY,
                            (
                                SELECT
                                    MAX([tpa].[Created])
                                FROM
                                    [dbo].[TokenPriceAttempts] tpa
                                WHERE
                                    [tpa].[TransferId] = [tdd].[Id]
                            ),
                            CURRENT_TIMESTAMP
                        ) >= 7
                    )
                )
            )

CodePudding user response:

Here is an attempt to help simplify. I stripped out all the [brackets] that really are not required unless you are running into something like a reserved keyword, or columns with spaces in their name (bad to begin with).

Anyhow, your main query had 3 instances of a select per ID. To eliminate that, I did a LEFT JOIN to a subquery that pulls all transfers of type < 2 AND JOINS to the price attempts ONCE. This way, the result will have already pre-aggregated the count(*) and Max(Created) done ONCE for the same basis of transfers in question with your WITH CTE declaration. So you dont have to keep running the 3 queries each time, and you dont have to query the entire table of ALL transfers, just those with same underlying type < 2 condition. The result subquery alias "PQ" (preQuery)

This now simplifies the readability of the outer WHERE clause from the redundant counts per Id.

WITH transferDateDiff AS 
(
SELECT
        t1.Id,
        t1.TransactionId,
        t1.From,
        t1.To,
        t1.Value,
        t1.Type,
        t1.ContractAddress,
        t1.TokenId,
        t2.Hash,
        t2.Timestamp,
        ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) AS diff 
    FROM
        dbo.Transfers t1
            LEFT JOIN dbo.Transactions t2
                ON t1.TransactionId = t2.Id
                LEFT JOIN dbo.TokenPrices tp
                    ON t1.ContractAddress = tp.ContractAddress
                    AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
                    AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
    WHERE
        t1.Type < 2
)


SELECT
        tdd.Id,
        tdd.TransactionId,
        tdd.From,
        tdd.To,
        tdd.Value,
        tdd.Type,
        tdd.ContractAddress,
        tdd.TokenId,
        tdd.Hash,
        tdd.Timestamp
    FROM
        transferDateDiff tdd
            LEFT JOIN
            ( SELECT
                    t1.Id,
                    COUNT(*) Attempts,
                    MAX(tpa.Created) MaxCreated
                FROM
                    dbo.Transfers t1
                        JOIN dbo.TokenPriceAttempts tpa
                            on t1.Id = tpa.TransferId
                WHERE
                    t1.Type < 2
                GROUP BY
                    t1.Id ) PQ
                on tdd.Id = PQ.Id
    WHERE
            tdd.diff IS NULL 
        AND (   PQ.Attempts IS NULL
            OR PQ.Attempts = 0
            OR (    PQ.Attempts < 5
                AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7
                )
            )

CodePudding user response:

I don't understand why you are doing this:

LEFT JOIN
(
    SELECT
        * 
    FROM
        [dbo].[TokenPrices]
)
AS [tp] 
ON [tp].[ContractAddress] = [t1].[ContractAddress]
AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp]) 
AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp]) 

Isn't this a

LEFT JOIN [dbo].[TokenPrices] as TP ...

This:

SELECT
    COUNT(*)
FROM
    [dbo].[TokenPriceAttempts] tpa
WHERE
    [tpa].[TransferId] = [tdd].[Id]

could be another CTE instead of being a sub... In fact any of your sub queries could be CTE's, that's part of a CTE is making things easier to read.

,TPA
AS
(
    SELECT COUNT(*)
    FROM [dbo].[TokenPriceAttempts] tpa
    WHERE [tpa].[TransferId] = [tdd].[Id]
)
  • Related