Home > Mobile >  How can I use SQL to apply a ledger of 'actions' to a given table?
How can I use SQL to apply a ledger of 'actions' to a given table?

Time:11-23

I use PostgreSQL 14 to work on a student_books table which manages books borrowed by students. Each student can have zero or more books borrowed at any point in time. The table look like this (the order of rows doesn't matter).

student_id book_id
1 113
2 37
5 94

Furthermore, I have an (append-only) library_ledger table with recent transactions at the library. It keeps track of whether a student borrowed or returned a book, and when. The order of rows matters, it's sorted in ascending order on the first date column:

date action student_id book_id
2022-11-20 09:14:09 borrow 2 3
2022-11-21 17:43:22 return 1 113
2022-11-22 14:03:04 borrow 5 204
2022-11-22 14:03:08 return 5 94
2022-11-22 14:03:15 return 5 204

Given the student_books and library_ledger tables, I'd like to compute the new set of books borrowed by each student. In the above case, I'd like to get the result set

student_id book_id
2 3
2 37

It's not difficult to write e.g. a Java program which processes each row in the library_ledger and updates the student_books table by issuing INSERT/DELETE queries. However, I wonder if this can be done in SQL directly.

Maybe if the initial table is grouped by student_id and aggregating the book IDs using array_agg, one could use that as the starting value for an aggregate function which processes the actions in library_ledger by transforming the array using either array_append or array_remove (depending on the value in the action column). At the end, the result could be un-nested.

Is there maybe a simpler way to achieve this, possibly even without using a custom aggregate function?

CodePudding user response:

I'm not sure why you're considering an aggregate function here. I think the basic idea is something like:

Remove the records where the most recent library_ledger action for that student/book is a "return"...

DELETE
FROM student_books
WHERE (student_id, book_id) IN
(
 SELECT student_id, book_id
 FROM
 (
  SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
  FROM library_ledger
 ) sq
 WHERE rn = 1 AND action='return' -- Most recent action is "return"
)

... then add in any student/book pairs that don't already exist in student_books and have a most recent action of "borrow"

INSERT INTO student_books
SELECT student_id, book_id
FROM 
(
 SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
 FROM library_ledger
) sq
WHERE rn = 1 AND action='borrow' -- Most recent action is "borrow"
AND (student_id, book_id) NOT IN -- Doesn't already exist in table
 (
  SELECT student_id, book_id
  FROM student_books
 )

I don't really see any reason to consider intermediate actions, since only the final one (for each student/book pair) should matter.

You can see my attempt in this fiddle.

CodePudding user response:

From the perspective of journal and ledger, there could be a need to understand between borrows and returns for auditing or accounting purpose. For example,

select student_id, 
       book_id,
       action,
       date,
       count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
       count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
  from library_ledger;

Gives us an idea of:

student_id|book_id|action|date                   |borrow_count|return_count|
---------- ------- ------ ----------------------- ------------ ------------ 
         1|    113|return|2022-11-21 17:43:22.000|           0|           1|
         2|      3|borrow|2022-11-20 09:14:09.000|           1|           0|
         5|     94|return|2022-11-22 14:03:08.000|           0|           1|
         5|    204|borrow|2022-11-22 14:03:04.000|           1|           0|
         5|    204|return|2022-11-22 14:03:15.000|           1|           1|

Based on that, the SQL action is determined:

with cte as (
select student_id, 
       book_id,
       action,
       date,
       count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
       count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
  from library_ledger)
select student_id,
       book_id,
       max(borrow_count) as borrow_count,
       max(return_count) as return_count,
       case 
          when max(borrow_count) > max(return_count) then 'insert'
          when max(borrow_count) < max(return_count) then 'delete'
          else 'noop'
       end as sql_action
  from cte
 group by 1,2;

We get:

student_id|book_id|borrow_count|return_count|sql_action|
---------- ------- ------------ ------------ ---------- 
         1|    113|           0|           1|delete    |
         2|      3|           1|           0|insert    |
         5|     94|           0|           1|delete    |
         5|    204|           1|           1|noop      |

For practices like accounting, there could be a review or an approval before apply those actions. After that, update student_books (kind of like account balance).

-- 1. delete
with cte as (
select student_id, 
       book_id,
       action,
       date,
       count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
       count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
  from library_ledger),
cte_delete as (
select student_id,
       book_id
  from cte
 group by 1,2
 having max(borrow_count) < max(return_count))
delete from student_books sb
 using cte_delete d
 where sb.student_id = d.student_id
   and sb.book_id = d.book_id;

select * from student_books;  
  
-- 2. insert
with cte as (
select student_id, 
       book_id,
       action,
       date,
       count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
       count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
  from library_ledger),
cte_insert as (
select student_id,
       book_id
  from cte
 group by 1,2
 having max(borrow_count) > max(return_count))
 insert into student_books
 select student_id, book_id
   from cte_insert;
student_id|book_id|
---------- ------- 
         2|     37|
         2|      3|
  • Related