Home > Back-end >  Why does OR in subquery make query so much slower?
Why does OR in subquery make query so much slower?

Time:10-04

I'm using MySQL and have the following query that I was trying to improve:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            ft.conversion_rate
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft ON (
                ft.commercial_document_id = pa.payment_source_id
                AND ft.commercial_document_type = pa.payment_source_type
            )
            OR (
                ft.commercial_document_id = cn.id
                AND ft.commercial_document_type = 'CreditNote'
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            AND ft.company_id = 14792
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

It takes 10s to run. I was able to improve to 0.047s it by removing the OR statement in the subquery and using COALESCE to get the result:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft_one ON (
                ft_one.commercial_document_id = pa.payment_source_id
                AND ft_one.commercial_document_type = pa.payment_source_type
                AND ft_one.company_id = 14792
            )
            LEFT JOIN financial_transactions AS ft_two ON (
                ft_two.commercial_document_id = cn.id
                AND ft_two.commercial_document_type = 'CreditNote'
                AND ft_two.company_id = 14792
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

However, I don't really understand why that worked? The original sub query ran very quickly and only returned 2 results, so why would it slow down the query by so much? Explain on the first query returns the following:

# id select_type table partitions type possible_keys key key_len ref rows filtered Extra FIELD13
1 SIMPLE pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_company_id index_payment_allocations_on_company_id 5 const 191 10.00 Using where
1 SIMPLE overpayment_pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_allocatable_id index_payment_allocations_on_payment_source_id 5 rails.pa.payment_source_id 1 3.42 Using where
1 SIMPLE op eq_ref PRIMARY PRIMARY 4 rails.overpayment_pa.allocatable_id 1 100.00
1 SIMPLE li eq_ref PRIMARY PRIMARY 4 rails.pa.payment_source_id 1 100.00
1 SIMPLE cn eq_ref PRIMARY PRIMARY 8 rails.li.parent_document_id 1 100.00 Using where; Using index
1 SIMPLE ft ALL transactions_unique_by_commercial_doc 12587878 0.00 Range checked for each record (index map: 0x2)

And for the second I get the following:

# id select_type table partitions type possible_keys key key_len ref rows filtered Extra FIELD13 FIELD14
1 SIMPLE pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_company_id index_payment_allocations_on_company_id 5 const 191 10.00 Using where
1 SIMPLE overpayment_pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_allocatable_id index_payment_allocations_on_payment_source_id 5 rails.pa.payment_source_id 1 3.42 Using where
1 SIMPLE op eq_ref PRIMARY PRIMARY 4 rails.overpayment_pa.allocatable_id 1 100.00
1 SIMPLE ft_one ref transactions_unique_by_commercial_doc index_financial_transactions_on_company_id transactions_unique_by_commercial_doc 773 rails.pa.payment_source_id rails.pa.payment_source_type 1 100.00 Using where
1 SIMPLE li eq_ref PRIMARY PRIMARY 4 rails.pa.payment_source_id 1 100.00
1 SIMPLE cn eq_ref PRIMARY PRIMARY 8 rails.li.parent_document_id 1 100.00 Using where; Using index
1 SIMPLE ft_two ref transactions_unique_by_commercial_doc index_financial_transactions_on_company_id transactions_unique_by_commercial_doc 773 rails.cn.id const 1 100.00 Using where

but I don't really know how to interpret those results.

CodePudding user response:

Look at the right side of the last row of your first EXPLAIN. It didn't use an index, and it had to scan through megarows. That's slow. Your second query used indexes for every step of the query, so it was much faster.

If your second query yields correct results, use it and don't look back. Congratulations! You've optimized a query.

OR operations, especially in ON clauses, are harder than usual for the query planner module to satisfy, because they often mean it has to take the union of two separate subqueries. It looks like the planner chose to brute-force it in your case. (brute force === scanning many rows.)

Without knowing your indexes, it's hard to help you further.

Read this to learn more. https://use-the-index-luke.com

CodePudding user response:

These may further speed up the second formulation:

overpayment_pa:
    INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id,  payment_source_type)
financial_transactions:
    INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)
  • Related