Home > database >  SQL function returns a single record. How to refactor SELECT query to use multiple records
SQL function returns a single record. How to refactor SELECT query to use multiple records

Time:09-02

The following query has a SQL function with a parameter being passed from a SELECT statement. At the same time, that SELECT statement has JOIN and WHERE clauses that use the record returned by function my_func.

CREATE FUNCTION my_func(pId bigint) RETURNS TABLE (table2_id bigint, column_1 bigint, field text) AS $$
BEGIN
  RETURN QUERY SELECT table2_id, table3_id, column_1, field 
    FROM t_name
  LEFT JOIN t2_name on t_name.key = t2_name.key
  WHERE t2_name.pId = pId
    ORDER BY table2_id DESC
    LIMIT 1;
END;$$ LANGUAGE plpgsql;

SELECT column_1, column_2,
FROM table_name 
LEFT JOIN table2_name ON table2_name.table2_id = (SELECT rec.table2_id FROM my_func(table_name.column_1))
WHERE column_1 = (SELECT rec.column_1 FROM my_func(table_name.column_1))
AND table2_name.field = (SELECT rec.field FROM my_func(table_name.column_1))
GROUP BY column_1, column_2
ORDER BY column_1 ASC, column_2;

This works perfectly since my_func has a LIMIT 1 statement that works well with the JOIN and WHERE clause in the query above. However, requirements changed and now I need to remove LIMIT so that my_func returns multiple records.

I could remove the LIMIT 1 statement in the function, and change the JOIN and WHERE clause to look as follows:

LEFT JOIN table2_name ON table2_name.table2_id IN (SELECT rec.table2_id FROM my_func(table_name.column_1))
WHERE column_1 IN (SELECT rec.column_1 FROM my_func(table_name.column_1))
AND table2_name.field IN (SELECT rec.field FROM my_func(table_name.column_1))

All I did was replace the = symbol with IN. However, even though this works, it is not solving what I'm trying to achieve, because it creates an exponential query for every result in the LEFT JOIN and every result in the first WHERE clause and every result in the AND WHERE clause. Clearly, not what I want.

I thought about using a FOR LOOP as a solution

DO
$$
DECLARE
  rec record;
BEGIN
FOR rec IN (SELECT table2_id, table3_id, column_1, field 
    FROM t_name
  LEFT JOIN t2_name on t_name.key = t2_name.key
  ORDER BY table2_id DESC) 
LOOP
  RETURN QUERY EXECUTE 
      SELECT column_1, column_2,
      FROM table_name 
      LEFT JOIN table2_name ON table2_name.table2_id = rec.table2_id
      WHERE column_1 = rec.column_1
      AND table2_name.field = rec.field
      GROUP BY column_1, column_2
      ORDER BY column_1 ASC, column_2;
END LOOP
END;
$$

It was okay to remove pId from the previous function cause that was not critical. But now I'm getting errors:

ERROR: cannot use RETURN QUERY in a non-SETOF function

Any ideas on how to achieve this? In a nutshell, the function worked fine when a single record was returned (forced by LIMIT 1) but not sure how to refactor the whole thing for multiple records. Thanks.

CodePudding user response:

@CetinBasoz made a good point on LEFT JOIN that is actually an INNER JOIN. Based on that and also the fact that you can compare whole rows in the IN expression, you can simplify your function and query like this:

CREATE FUNCTION my_func(pId bigint) RETURNS TABLE (table2_id bigint, column_1 bigint, field text) AS $$
BEGIN
  RETURN QUERY 
  SELECT table2_id, column_1, field 
  FROM t_name
  -- LEFT Join can be commented out, since you have condition on t2_name in the WHERE clause below
  -- LEFT 
  JOIN t2_name 
  on t_name.key = t2_name.key
  WHERE t2_name.pId = pId

-- Below two lines can be commented out, since they don't make sense 
--    ORDER BY table2_id DESC
--    LIMIT 1;
END;$$ LANGUAGE plpgsql;

SELECT column_1, column_2
FROM table_name 
-- Here also, LEFT Join can be commented out, since you have condition on my_func in the WHERE clause below
-- LEFT 
JOIN table2_name 
ON (table2_name.table2_id, column_1, table2_name.field) 
    in (
      SELECT table2_id, column_1, field FROM my_func(table_name.column_1)
      -- this is for your previous logic, with limiting to 1 record, for testing:
      -- ORDER BY table2_id DESC
      -- LIMIT 1
    )
GROUP BY column_1, column_2
ORDER BY column_1 ASC, column_2;

Update I completely overlooked the last query, where pId is removed. This way the function my_func is not required at all, and the final query can be simplified as:

SELECT column_1, column_2
FROM (
    SELECT table2_id, table3_id, column_1, field 
      FROM t_name
    LEFT JOIN t2_name on t_name.key = t2_name.key
  ) rec
JOIN table_name on table_name.column_1 = rec.column_1
JOIN table2_name 
   on table2_name.table2_id = rec.table2_id
  and table2_name.field = rec.field
GROUP BY column_1, column_2
ORDER BY column_1 ASC, column_2;
  • Related