Why does OR in subquery make query so much slower?-CodePudding

I'm using MySQL and have the following query that I was trying to improve:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            ft.conversion_rate
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft ON (
                ft.commercial_document_id = pa.payment_source_id
                AND ft.commercial_document_type = pa.payment_source_type
            )
            OR (
                ft.commercial_document_id = cn.id
                AND ft.commercial_document_type = 'CreditNote'
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            AND ft.company_id = 14792
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

It takes 10s to run. I was able to improve to 0.047s it by removing the OR statement in the subquery and using COALESCE to get the result:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft_one ON (
                ft_one.commercial_document_id = pa.payment_source_id
                AND ft_one.commercial_document_type = pa.payment_source_type
                AND ft_one.company_id = 14792
            )
            LEFT JOIN financial_transactions AS ft_two ON (
                ft_two.commercial_document_id = cn.id
                AND ft_two.commercial_document_type = 'CreditNote'
                AND ft_two.company_id = 14792
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

However, I don't really understand why that worked? The original sub query ran very quickly and only returned 2 results, so why would it slow down the query by so much? Explain on the first query returns the following:

# id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra	FIELD13
1	SIMPLE	pa	ref	index_payment_allocations_on_payment_source_id	index_payment_allocations_on_company_id	index_payment_allocations_on_company_id	5	const	191	10.00	Using where
1	SIMPLE	overpayment_pa	ref	index_payment_allocations_on_payment_source_id	index_payment_allocations_on_allocatable_id	index_payment_allocations_on_payment_source_id	5	rails.pa.payment_source_id	1	3.42	Using where
1	SIMPLE	op	eq_ref	PRIMARY	PRIMARY	4	rails.overpayment_pa.allocatable_id	1	100.00
1	SIMPLE	li	eq_ref	PRIMARY	PRIMARY	4	rails.pa.payment_source_id	1	100.00
1	SIMPLE	cn	eq_ref	PRIMARY	PRIMARY	8	rails.li.parent_document_id	1	100.00	Using where; Using index
1	SIMPLE	ft	ALL	transactions_unique_by_commercial_doc				12587878	0.00	Range checked for each record (index map: 0x2)

And for the second I get the following:

# id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra	FIELD13	FIELD14
1	SIMPLE	pa	ref	index_payment_allocations_on_payment_source_id	index_payment_allocations_on_company_id	index_payment_allocations_on_company_id	5	const	191	10.00	Using where
1	SIMPLE	overpayment_pa	ref	index_payment_allocations_on_payment_source_id	index_payment_allocations_on_allocatable_id	index_payment_allocations_on_payment_source_id	5	rails.pa.payment_source_id	1	3.42	Using where
1	SIMPLE	op	eq_ref	PRIMARY	PRIMARY	4	rails.overpayment_pa.allocatable_id	1	100.00
1	SIMPLE	ft_one	ref	transactions_unique_by_commercial_doc	index_financial_transactions_on_company_id	transactions_unique_by_commercial_doc	773	rails.pa.payment_source_id	rails.pa.payment_source_type	1	100.00	Using where
1	SIMPLE	li	eq_ref	PRIMARY	PRIMARY	4	rails.pa.payment_source_id	1	100.00
1	SIMPLE	cn	eq_ref	PRIMARY	PRIMARY	8	rails.li.parent_document_id	1	100.00	Using where; Using index
1	SIMPLE	ft_two	ref	transactions_unique_by_commercial_doc	index_financial_transactions_on_company_id	transactions_unique_by_commercial_doc	773	rails.cn.id	const	1	100.00	Using where

but I don't really know how to interpret those results.

CodePudding user response：

Look at the right side of the last row of your first EXPLAIN. It didn't use an index, and it had to scan through megarows. That's slow. Your second query used indexes for every step of the query, so it was much faster.

If your second query yields correct results, use it and don't look back. Congratulations! You've optimized a query.

OR operations, especially in ON clauses, are harder than usual for the query planner module to satisfy, because they often mean it has to take the union of two separate subqueries. It looks like the planner chose to brute-force it in your case. (brute force === scanning many rows.)

Without knowing your indexes, it's hard to help you further.

Read this to learn more. https://use-the-index-luke.com

CodePudding user response：

These may further speed up the second formulation:

overpayment_pa:
    INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id,  payment_source_type)
financial_transactions:
    INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)