Home > Back-end >  Selective Predicate Pushdown To View
Selective Predicate Pushdown To View

Time:11-24

I have a large column store table that gets frequent updates. I don't ingest the updates directly into the source table because that would cause, in most cases, a small number of updates to cause a full table micro partition rebuild. Instead I stream updates to an update table, and at query time I combine both. In practice this works out well.

So simplify things, i'll throw this in a view users_view.

CREATE OR REPLACE VIEW users_view AS (
    SELECT * FROM users
    UNION ALL 
    SELECT * FROM user_changes
    QUALIFY ROW_NUMBER() OVER(
        PARTITION BY id 
        ORDER BY last_updated_at DESC
    ) = 1
)

Both the users table and user_changes table have the same scheme as well as the some partition configuration. This way I can use predicate pushdown on the view to only select users within the correct partition. Let's say this is the account_id.

SELECT * FROM users_view
WHERE account_id = 1234

But the users table is quite a bit larger than the user_changes table, and I'd like to push even more predicates down to the users table without pushing additional predicates down to the user_changes table. Why? Because matching on the users table, while 98% accurate, has false positives/negatives. The details from the user_changes are needed to set the record straight. What this would look like outside of a view is this:

SELECT * FROM (
    SELECT * FROM users
    WHERE account_id = 1234 AND city = 'Chicago'
    UNION ALL 
    SELECT * FROM user_changes
    WHERE account_id = 1234
    QUALIFY ROW_NUMBER() OVER(
        PARTITION BY id 
        ORDER BY last_updated_at DESC
    ) = 1
)
WHERE account_id = 1234 AND city = 'Chicago'

As nasty as this looks, it's much more performant. All conditions can be applied to the much larger users table, but only unchanging conditions can be applied to the users_changes table. i.e. A user may change cities, but a user can't change accounts. The second run of all conditions after the union is to catch any changes the user_changes introduced.

This is cumbersome to write, and even more so as the query becomes complicated and query builders get involved. So i'm looking for way way to convince the sql planner to skip predicate pushdown of some predicates on my user_changes table without needing to format the query like this. Ideally with a view.

PSUEDO SQL. PSUEDO SQL. PSUEDO SQL

In my wildest dreams I could tell the query planner where it can use partition predicates, and where it can use non-partition predicates.

CREATE OR REPLACE VIEW users_view AS (
    SELECT * FROM (
        SELECT * FROM users
        %PARTITION_PREDICATES%
        %NON_PARTITION_PREDICATES%

        UNION ALL 

        SELECT * FROM user_changes
        %PARTITION_PREDICATES%

        QUALIFY ROW_NUMBER() OVER(
            PARTITION BY id 
            ORDER BY last_updated_at DESC
        ) = 1
    )
    %PARTITION_PREDICATES%
    %NON_PARTITION_PREDICATES%
)

SELECT * FROM users_view
WHERE account_id = 1234 AND city = 'Chicago'

Any crazy ideas?

CodePudding user response:

you can add additional column src for determining the source table and wrap predicates in the CASE:

select * from
(
SELECT u.*, 'users' as src FROM users u
union all
SELECT uc.*, 'users_changes' as src FROM users_changes uc
) 
WHERE --applied only to users
      case when src  = 'users' 
                 then city = 'Chicago' --predicate wrapped in case
           else true
       end
  --applied to all
  AND account=12345 
  • Related