I have an Oracle SQL database that consists of order book information, e.g.
ORDER_ID | TIMESTAMP | OPERATION | ORDER_STATUS | ... |
---|---|---|---|---|
1 | 00:00:01 | INSERT | New | ... |
1 | 00:00:05 | UPDATE | Partially Filled | ... |
2 | 00:00:07 | UPDATE | Partially Filled | ... |
1 | 00:00:08 | CANCEL | Filled | ... |
3 | 00:00:08 | INSERT | NEW | ... |
Now, this data is not always perfectly filled, as in, sometimes there will be order_ids with missing information.
One such problem I'm trying to identify are orders (so, all orders with the same order_ID) that are missing the Operation 'INSERT', e.g. there are order_ids that have the operation 'UPDATE' or 'CANCEL', but not an 'INSERT'. In the example table above, that would be the case for order_id 2 (it has an 'UPDATE', but no 'INSERT')
I was able to perform this analysis by just download the whole data set and analyzing the data with Python, but I'd like to be able to do this (faster) with a direct SQL query, if possible.
CodePudding user response:
You may use conditional aggregation to check if some status exists along with others.
select order_id
from sample
where operation in ('INSERT', 'UPDATE', 'CANCEL', 'OTHER_STATUS_TO_CHECK_TOGETHER_WITH_INSERT')
group by order_id
having count(case operation when 'INSERT' then 1 end) = 0
For your sample data it returns
ORDER_ID |
---|
2 |
CodePudding user response:
See if the following helps:
Select Order_Id
from t
where Operation in ('UPDATE','CANCEL')
and not exists (
select * from t t2
where t2.Order_Id = t.Order_Id and t2.Operation = 'INSERT'
);
CodePudding user response:
From Oracle 12, you can use MATCH_RECOGNIZE
to perform row-by-row pattern matching to determine whether there is an INSERT
operation before an UPDATE
or CANCEL
operation:
SELECT order_id
FROM order_book
MATCH_RECOGNIZE(
PARTITION BY order_id
ORDER BY timestamp
MEASURES
COUNT(insert_op.operation) AS has_insert
PATTERN ( insert_op? (update_op|cancel_op) )
DEFINE
insert_op AS operation = 'INSERT',
update_op AS operation = 'UPDATE',
cancel_op AS operation = 'CANCEL'
)
WHERE has_insert = 0;
Which, for the sample data:
CREATE TABLE order_book (ORDER_ID, TIMESTAMP, OPERATION, ORDER_STATUS ) AS
SELECT 1, '00:00:01', 'INSERT', 'New' FROM DUAL UNION ALL
SELECT 1, '00:00:05', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 2, '00:00:07', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 1, '00:00:08', 'CANCEL', 'Filled' FROM DUAL UNION ALL
SELECT 3, '00:00:08', 'INSERT', 'NEW' FROM DUAL;
Outputs:
ORDER_ID |
---|
2 |
Or, if you want groups where an UPDATE
or CANCEL
operation is first in the group then you can use:
SELECT order_id
FROM order_book
MATCH_RECOGNIZE(
PARTITION BY order_id
ORDER BY timestamp
PATTERN ( ^ (update_op|cancel_op) )
DEFINE
update_op AS operation = 'UPDATE',
cancel_op AS operation = 'CANCEL'
);
Which outputs the same.