Home > Blockchain >  SQL query group where one column is missing specific data
SQL query group where one column is missing specific data

Time:01-17

I have an Oracle SQL database that consists of order book information, e.g.

ORDER_ID TIMESTAMP OPERATION ORDER_STATUS ...
1 00:00:01 INSERT New ...
1 00:00:05 UPDATE Partially Filled ...
2 00:00:07 UPDATE Partially Filled ...
1 00:00:08 CANCEL Filled ...
3 00:00:08 INSERT NEW ...

Now, this data is not always perfectly filled, as in, sometimes there will be order_ids with missing information.

One such problem I'm trying to identify are orders (so, all orders with the same order_ID) that are missing the Operation 'INSERT', e.g. there are order_ids that have the operation 'UPDATE' or 'CANCEL', but not an 'INSERT'. In the example table above, that would be the case for order_id 2 (it has an 'UPDATE', but no 'INSERT')

I was able to perform this analysis by just download the whole data set and analyzing the data with Python, but I'd like to be able to do this (faster) with a direct SQL query, if possible.

CodePudding user response:

You may use conditional aggregation to check if some status exists along with others.

select order_id
from sample
where operation in ('INSERT', 'UPDATE', 'CANCEL', 'OTHER_STATUS_TO_CHECK_TOGETHER_WITH_INSERT')
group by order_id
having count(case operation when 'INSERT' then 1 end) = 0

For your sample data it returns

ORDER_ID
2

fiddle

CodePudding user response:

See if the following helps:

Select Order_Id
from t
where Operation in ('UPDATE','CANCEL')
and not exists (
  select * from t t2
  where t2.Order_Id = t.Order_Id and t2.Operation = 'INSERT'
);

CodePudding user response:

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching to determine whether there is an INSERT operation before an UPDATE or CANCEL operation:

SELECT order_id
FROM   order_book
MATCH_RECOGNIZE(
  PARTITION BY order_id
  ORDER     BY timestamp
  MEASURES
    COUNT(insert_op.operation) AS has_insert
  PATTERN ( insert_op? (update_op|cancel_op)  )
  DEFINE
    insert_op AS operation = 'INSERT',
    update_op AS operation = 'UPDATE',
    cancel_op AS operation = 'CANCEL'
)
WHERE has_insert = 0;

Which, for the sample data:

CREATE TABLE order_book (ORDER_ID, TIMESTAMP, OPERATION, ORDER_STATUS ) AS
SELECT 1, '00:00:01', 'INSERT', 'New'              FROM DUAL UNION ALL
SELECT 1, '00:00:05', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 2, '00:00:07', 'UPDATE', 'Partially Filled' FROM DUAL UNION ALL
SELECT 1, '00:00:08', 'CANCEL', 'Filled'           FROM DUAL UNION ALL
SELECT 3, '00:00:08', 'INSERT', 'NEW'              FROM DUAL;

Outputs:

ORDER_ID
2

Or, if you want groups where an UPDATE or CANCEL operation is first in the group then you can use:

SELECT order_id
FROM   order_book
MATCH_RECOGNIZE(
  PARTITION BY order_id
  ORDER     BY timestamp
  PATTERN ( ^ (update_op|cancel_op) )
  DEFINE
    update_op AS operation = 'UPDATE',
    cancel_op AS operation = 'CANCEL'
);

Which outputs the same.

fiddle

  • Related