I'm using MySQL and have the following query that I was trying to improve:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
ft.conversion_rate
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft ON (
ft.commercial_document_id = pa.payment_source_id
AND ft.commercial_document_type = pa.payment_source_type
)
OR (
ft.commercial_document_id = cn.id
AND ft.commercial_document_type = 'CreditNote'
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
AND ft.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
It takes 10s to run. I was able to improve to 0.047s it by removing the OR statement in the subquery and using COALESCE to get the result:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft_one ON (
ft_one.commercial_document_id = pa.payment_source_id
AND ft_one.commercial_document_type = pa.payment_source_type
AND ft_one.company_id = 14792
)
LEFT JOIN financial_transactions AS ft_two ON (
ft_two.commercial_document_id = cn.id
AND ft_two.commercial_document_type = 'CreditNote'
AND ft_two.company_id = 14792
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
However, I don't really understand why that worked? The original sub query ran very quickly and only returned 2 results, so why would it slow down the query by so much? Explain on the first query returns the following:
# id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | FIELD13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | pa | ref | index_payment_allocations_on_payment_source_id | index_payment_allocations_on_company_id | index_payment_allocations_on_company_id | 5 | const | 191 | 10.00 | Using where | |
1 | SIMPLE | overpayment_pa | ref | index_payment_allocations_on_payment_source_id | index_payment_allocations_on_allocatable_id | index_payment_allocations_on_payment_source_id | 5 | rails.pa.payment_source_id | 1 | 3.42 | Using where | |
1 | SIMPLE | op | eq_ref | PRIMARY | PRIMARY | 4 | rails.overpayment_pa.allocatable_id | 1 | 100.00 | |||
1 | SIMPLE | li | eq_ref | PRIMARY | PRIMARY | 4 | rails.pa.payment_source_id | 1 | 100.00 | |||
1 | SIMPLE | cn | eq_ref | PRIMARY | PRIMARY | 8 | rails.li.parent_document_id | 1 | 100.00 | Using where; Using index | ||
1 | SIMPLE | ft | ALL | transactions_unique_by_commercial_doc | 12587878 | 0.00 | Range checked for each record (index map: 0x2) |
And for the second I get the following:
# id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | FIELD13 | FIELD14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | pa | ref | index_payment_allocations_on_payment_source_id | index_payment_allocations_on_company_id | index_payment_allocations_on_company_id | 5 | const | 191 | 10.00 | Using where | ||
1 | SIMPLE | overpayment_pa | ref | index_payment_allocations_on_payment_source_id | index_payment_allocations_on_allocatable_id | index_payment_allocations_on_payment_source_id | 5 | rails.pa.payment_source_id | 1 | 3.42 | Using where | ||
1 | SIMPLE | op | eq_ref | PRIMARY | PRIMARY | 4 | rails.overpayment_pa.allocatable_id | 1 | 100.00 | ||||
1 | SIMPLE | ft_one | ref | transactions_unique_by_commercial_doc | index_financial_transactions_on_company_id | transactions_unique_by_commercial_doc | 773 | rails.pa.payment_source_id | rails.pa.payment_source_type | 1 | 100.00 | Using where | |
1 | SIMPLE | li | eq_ref | PRIMARY | PRIMARY | 4 | rails.pa.payment_source_id | 1 | 100.00 | ||||
1 | SIMPLE | cn | eq_ref | PRIMARY | PRIMARY | 8 | rails.li.parent_document_id | 1 | 100.00 | Using where; Using index | |||
1 | SIMPLE | ft_two | ref | transactions_unique_by_commercial_doc | index_financial_transactions_on_company_id | transactions_unique_by_commercial_doc | 773 | rails.cn.id | const | 1 | 100.00 | Using where |
but I don't really know how to interpret those results.
CodePudding user response:
Look at the right side of the last row of your first EXPLAIN. It didn't use an index, and it had to scan through megarows. That's slow. Your second query used indexes for every step of the query, so it was much faster.
If your second query yields correct results, use it and don't look back. Congratulations! You've optimized a query.
OR operations, especially in ON clauses, are harder than usual for the query planner module to satisfy, because they often mean it has to take the union of two separate subqueries. It looks like the planner chose to brute-force it in your case. (brute force === scanning many rows.)
Without knowing your indexes, it's hard to help you further.
Read this to learn more. https://use-the-index-luke.com
CodePudding user response:
These may further speed up the second formulation:
overpayment_pa:
INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id, payment_source_type)
financial_transactions:
INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)