I use PostgreSQL 14 to work on a student_books
table which manages books borrowed by students. Each student can have zero or more books borrowed at any point in time. The table look like this (the order of rows doesn't matter).
student_id | book_id |
---|---|
1 | 113 |
2 | 37 |
5 | 94 |
Furthermore, I have an (append-only) library_ledger
table with recent transactions at the library. It keeps track of whether a student borrowed or returned a book, and when. The order of rows matters, it's sorted in ascending order on the first date
column:
date | action | student_id | book_id |
---|---|---|---|
2022-11-20 09:14:09 | borrow | 2 | 3 |
2022-11-21 17:43:22 | return | 1 | 113 |
2022-11-22 14:03:04 | borrow | 5 | 204 |
2022-11-22 14:03:08 | return | 5 | 94 |
2022-11-22 14:03:15 | return | 5 | 204 |
Given the student_books
and library_ledger
tables, I'd like to compute the new set of books borrowed by each student. In the above case, I'd like to get the result set
student_id | book_id |
---|---|
2 | 3 |
2 | 37 |
It's not difficult to write e.g. a Java program which processes each row in the library_ledger
and updates the student_books
table by issuing INSERT
/DELETE
queries. However, I wonder if this can be done in SQL directly.
Maybe if the initial table is grouped by student_id and aggregating the book IDs using array_agg
, one could use that as the starting value for an aggregate function which processes the actions in library_ledger
by transforming the array using either array_append
or array_remove
(depending on the value in the action
column). At the end, the result could be un-nested.
Is there maybe a simpler way to achieve this, possibly even without using a custom aggregate function?
CodePudding user response:
I'm not sure why you're considering an aggregate function here. I think the basic idea is something like:
Remove the records where the most recent library_ledger
action for that student/book is a "return"...
DELETE
FROM student_books
WHERE (student_id, book_id) IN
(
SELECT student_id, book_id
FROM
(
SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
FROM library_ledger
) sq
WHERE rn = 1 AND action='return' -- Most recent action is "return"
)
... then add in any student/book pairs that don't already exist in student_books
and have a most recent action of "borrow"
INSERT INTO student_books
SELECT student_id, book_id
FROM
(
SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
FROM library_ledger
) sq
WHERE rn = 1 AND action='borrow' -- Most recent action is "borrow"
AND (student_id, book_id) NOT IN -- Doesn't already exist in table
(
SELECT student_id, book_id
FROM student_books
)
I don't really see any reason to consider intermediate actions, since only the final one (for each student/book pair) should matter.
You can see my attempt in this fiddle.
CodePudding user response:
From the perspective of journal
and ledger
, there could be a need to understand between borrows and returns for auditing or accounting purpose. For example,
select student_id,
book_id,
action,
date,
count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
from library_ledger;
Gives us an idea of:
student_id|book_id|action|date |borrow_count|return_count|
---------- ------- ------ ----------------------- ------------ ------------
1| 113|return|2022-11-21 17:43:22.000| 0| 1|
2| 3|borrow|2022-11-20 09:14:09.000| 1| 0|
5| 94|return|2022-11-22 14:03:08.000| 0| 1|
5| 204|borrow|2022-11-22 14:03:04.000| 1| 0|
5| 204|return|2022-11-22 14:03:15.000| 1| 1|
Based on that, the SQL action is determined:
with cte as (
select student_id,
book_id,
action,
date,
count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
from library_ledger)
select student_id,
book_id,
max(borrow_count) as borrow_count,
max(return_count) as return_count,
case
when max(borrow_count) > max(return_count) then 'insert'
when max(borrow_count) < max(return_count) then 'delete'
else 'noop'
end as sql_action
from cte
group by 1,2;
We get:
student_id|book_id|borrow_count|return_count|sql_action|
---------- ------- ------------ ------------ ----------
1| 113| 0| 1|delete |
2| 3| 1| 0|insert |
5| 94| 0| 1|delete |
5| 204| 1| 1|noop |
For practices like accounting, there could be a review or an approval before apply those actions. After that, update student_books
(kind of like account balance).
-- 1. delete
with cte as (
select student_id,
book_id,
action,
date,
count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
from library_ledger),
cte_delete as (
select student_id,
book_id
from cte
group by 1,2
having max(borrow_count) < max(return_count))
delete from student_books sb
using cte_delete d
where sb.student_id = d.student_id
and sb.book_id = d.book_id;
select * from student_books;
-- 2. insert
with cte as (
select student_id,
book_id,
action,
date,
count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
from library_ledger),
cte_insert as (
select student_id,
book_id
from cte
group by 1,2
having max(borrow_count) > max(return_count))
insert into student_books
select student_id, book_id
from cte_insert;
student_id|book_id|
---------- -------
2| 37|
2| 3|