I have a table where I store timeseries data:
customer_id | transaction_type | transaction_date | transaction_value |
---|---|---|---|
1 | buy | 2022-12-04 | 100.0 |
1 | sell | 2022-12-04 | 80.0 |
2 | buy | 2022-12-04 | 120.0 |
2 | sell | 2022-12-03 | 120.0 |
1 | buy | 2022-12-02 | 90.0 |
1 | sell | 2022-12-02 | 70.0 |
2 | buy | 2022-12-01 | 110.0 |
2 | sell | 2022-12-01 | 110.0 |
Number of customers and transaction types is not limited. Currently there are over 10,000 customers and over 600 transaction types. Dates of transactions between customers can be unique and will not always align based on any criteria (that's why I've tried using LATERAL JOIN
— you'll see it later).
I want to filter those records to get customers IDs with the values of the transaction where any arbitrary condition is met. Number of those conditions in a query is not restricted to two — can be anything. For example:
Give me all customers who have a buy with value > $90 and a sale with value > 100$ as their latest transactions
The final query should return these two rows:
customer_id | transaction_type | transaction_date | transaction_value |
---|---|---|---|
2 | buy | 2022-12-04 | 120$ |
2 | sell | 2022-12-03 | 120$ |
The closest I've came to what I need was by creating a materialized view cross-joining customer IDs and transaction_types:
customer_id | transaction_type |
---|---|
1 | buy |
1 | sell |
2 | buy |
2 | sell |
And then running a LATERAL JOIN
between table with transactions and customer_transactions
materialized view:
SELECT *
FROM customer_transactions
JOIN LATERAL (
SELECT *
FROM transactions
WHERE (transactions.customer_id = customer_transactions.customer_id)
AND (transactions.transaction_type = customer_transactions.transaction_type)
AND transactions.transaction_date <= '2022-12-04' -- this can change for filtering records back in time
ORDER BY transactions.transaction_date DESC
LIMIT 1
) transactions ON TRUE
WHERE customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90
It seems to be working when one condition is specified. But as soon as subsequential conditions are introduced that's where things start falling apart for me; changing condition to:
WHERE (customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90)
AND (customer_transactions.transaction_type = 'sell'
AND customer_transactions.transaction_value > 100)
is obviously not going to work as there is no row that satisfies both of these conditions.
Is it possible to achieve this using the aproach I took? If so what am I missing? Or maybe there is another way to solve that would be more appropriate?
CodePudding user response:
You could use a CTE with row_number and chech out the last transactios
WITH CTE as (SELECT
"customer_id", "transaction_type", "transaction_date",
"transaction_value",
ROW_NUMBER() OVER(PARTITION BY "customer_id", "transaction_type" ORDER BY "transaction_date" DESC) rn
FROM tab1)
SELECT "customer_id", "transaction_type", "transaction_date",
"transaction_value" FROM CTE
WHERE rn = 1
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90)
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100)
ELSE FALSE END
AND (SELECT COUNT(*) FROM CTE c1
WHERE c1."customer_id"= CTE."customer_id" and rn = 1
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90)
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100)
ELSE FALSE END ) = 2
customer_id | transaction_type | transaction_date | transaction_value |
---|---|---|---|
2 | buy | 2022-12-04 | 120.0 |
2 | sell | 2022-12-03 | 120.0 |
SELECT 2
CodePudding user response:
Use distinct on with custom order to select all the latest transactions per user according to your several criteria (hence the OR
) - latest
CTE, then count the number of result records per user using count
as a window function - latest_with_count
CTE - and finally pick these that have a count equal to the number of criteria.
This may be a bit abstract template but hopefully helps with the generic problem. The approach would work for any number of conditions.
with t as
(
/*
your query here with several conditions in DISJUNCTION (OR) here, i.e.
WHERE (customer_transactions.transaction_type = 'buy' AND customer_transactions.transaction_value > 90)
OR (customer_transactions.transaction_type = 'sell' AND customer_transactions.transaction_value > 100)
*/
),
latest as
(
select distinct on (customer_id, transaction_type) *
from t
-- pick the latest per customer & type
order by customer_id, transaction_type, transaction_date desc
),
latest_with_count as
(
select *, count(*) over (partition by customer_id) cnt
from latest
)
select *
from latest_with_count
where cnt = 2 -- the number of criteria