I have a large column store table that gets frequent updates. I don't ingest the updates directly into the source table because that would cause, in most cases, a small number of updates to cause a full table micro partition rebuild. Instead I stream updates to an update table, and at query time I combine both. In practice this works out well.
So simplify things, i'll throw this in a view users_view
.
CREATE OR REPLACE VIEW users_view AS (
SELECT * FROM users
UNION ALL
SELECT * FROM user_changes
QUALIFY ROW_NUMBER() OVER(
PARTITION BY id
ORDER BY last_updated_at DESC
) = 1
)
Both the users
table and user_changes
table have the same scheme as well as the some partition configuration. This way I can use predicate pushdown on the view to only select users within the correct partition. Let's say this is the account_id
.
SELECT * FROM users_view
WHERE account_id = 1234
But the users
table is quite a bit larger than the user_changes
table, and I'd like to push even more predicates down to the users
table without pushing additional predicates down to the user_changes
table. Why? Because matching on the users
table, while 98% accurate, has false positives/negatives. The details from the user_changes
are needed to set the record straight. What this would look like outside of a view is this:
SELECT * FROM (
SELECT * FROM users
WHERE account_id = 1234 AND city = 'Chicago'
UNION ALL
SELECT * FROM user_changes
WHERE account_id = 1234
QUALIFY ROW_NUMBER() OVER(
PARTITION BY id
ORDER BY last_updated_at DESC
) = 1
)
WHERE account_id = 1234 AND city = 'Chicago'
As nasty as this looks, it's much more performant. All conditions can be applied to the much larger users
table, but only unchanging conditions can be applied to the users_changes
table. i.e. A user may change cities, but a user can't change accounts. The second run of all conditions after the union is to catch any changes the user_changes
introduced.
This is cumbersome to write, and even more so as the query becomes complicated and query builders get involved. So i'm looking for way way to convince the sql planner to skip predicate pushdown of some predicates on my user_changes
table without needing to format the query like this. Ideally with a view.
PSUEDO SQL. PSUEDO SQL. PSUEDO SQL
In my wildest dreams I could tell the query planner where it can use partition predicates, and where it can use non-partition predicates.
CREATE OR REPLACE VIEW users_view AS (
SELECT * FROM (
SELECT * FROM users
%PARTITION_PREDICATES%
%NON_PARTITION_PREDICATES%
UNION ALL
SELECT * FROM user_changes
%PARTITION_PREDICATES%
QUALIFY ROW_NUMBER() OVER(
PARTITION BY id
ORDER BY last_updated_at DESC
) = 1
)
%PARTITION_PREDICATES%
%NON_PARTITION_PREDICATES%
)
SELECT * FROM users_view
WHERE account_id = 1234 AND city = 'Chicago'
Any crazy ideas?
CodePudding user response:
you can add additional column src
for determining the source table and wrap predicates in the CASE:
select * from
(
SELECT u.*, 'users' as src FROM users u
union all
SELECT uc.*, 'users_changes' as src FROM users_changes uc
)
WHERE --applied only to users
case when src = 'users'
then city = 'Chicago' --predicate wrapped in case
else true
end
--applied to all
AND account=12345