Home > other >  MySQL - Use custom function over window
MySQL - Use custom function over window

Time:08-03

thank you for reading this question.

Im trying to write a SQL statement that should be able to retrieve certain random elements from a table, exactly X elements from all IDs appearing in a set of IDs. Im a MySQL newbie, so the way I am trying to solve this may not be the correct one - in that case, please guide me.

Right now I have the following table:

 ------------- ------ ------ ----- --------- ---------------- 
| Field       | Type | Null | Key | Default | Extra          |
 ------------- ------ ------ ----- --------- ---------------- 
| id0         | int  | NO   | PRI | NULL    | auto_increment |
| id1         | int  | YES  | MUL | NULL    |                |
| id2         | int  | YES  |     | NULL    |                |
 ------------- ------ ------ ----- --------- ---------------- 

id0 and id2 are unique, and for every id1 we can have multiple id2 related.

My goal is to be able to feed MySQL a set of id1 values, and for each one, I need to retrieve X number of random elements that pertain to that ID (a list of id2 values related to that id1, list is of length X).

I've been reading about windows and using functions with them, so I created a custom function:

DELIMITER //
CREATE FUNCTION get_x_random (
    X INT,
    id1_in INT
) RETURNS INT

BEGIN

    DECLARE id2_return INT;
    SELECT id2 INTO id2_return
    FROM table
    WHERE id1 = id1_in
    ORDER BY RAND() LIMIT X;
    RETURN id2_return;
END; //

And I am trying to use this function with the following MySQL statement:

SELECT
    get_x_random(1, id1) OVER w AS x
    FROM table
    WHERE id1 IN (
        SELECT t2.id1
        FROM table AS t2
        WHERE t2.id1 IN SET_OF_VALUES
        GROUP BY t2.id1
    )
    WINDOW w AS (PARTITION BY id1);

However, this is not working (Syntax error near OVER w AS ....)

What am I doing wrong? Is it even possible to use custom functions along with windows? Is there any other possible approach to tackle this problem?

Note: I tried iterating with the Python MySQL wrapper over the different id1's doing a simple

SELECT id2
FROM table
WHERE id1 = {str(id1)}
ORDER BY RAND() LIMIT {int(X)}

And works, however, the accumulated fetching of all results makes it too slow (the length of the list of ids is in the order of hundreds), so I would like to find a purely SQL solution.

Thank you so much for reading until this point.

CodePudding user response:

PostGreSQL user here, I hope my answer applies.

Custom function over window

You declare get_x_random as a function, but over expects an aggregate. The engine has no idea how to combine the rows that over feeds it.

Apparently creating a custom aggregate function in MySQL requires writing C. However I understand that you need a function that takes a set of ids as input, and returns a set of rows - thankfully we won't need a custom aggregate for that.

Fetching n rows by group

"Get n random rows for each of those ids" : this is a surprisingly awkward thing to do in the DBMS I've seen. Topic is covered in SE sites and others :

For DBMS providing windowing like modern MySQL, the usual solution goes like :

with pivot as (
    select
        id1,
        row_number() over (
            partition by id1 -- within other rows with the same id1...
            order by id0     -- which position does this row have ?
        )
    from  _my_table
)
-- within each group, take only rows 1, 2 and 3
select _my_table.*
from _my_table
join pivot using (id1)
where row_number <= 3

For code producing a sample ("fetch random n rows"), I'll carefully evade the subject by citing this post.

Encapsulating

I guess the original intention in your creating a function was to be able to do something like : select * from _my_table, get_x_rows(_my_table.id) where id in (12, 13, 15). So get_x_rows takes an id and produces n rows.

  • it's nice to encapsulate the clunky fetch-n-by-group code but...
  • make sure the MySQL planner handles the correlated FROM clause properly (applies filter first, uses indices if applicable, etc)
  • compare with the convenience and performance of using a view that computes all ids with pre-calculated samples. Again, only if the planner is smart enough to apply outer filters into the view.

CodePudding user response:

In MySQL, I don't think you can make a custom window function. Certainly not with the stored function language. Also, your stored function wouldn't work anyway, because stored functions can only return a scalar, not a result set. There's no support for something like Python's yield in MySQL stored functions.

This problem can be solved with standard window functions, I think.

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER w AS rownum
    FROM mytable
    WHERE id1 IN (...set of values...)
    WINDOW w AS (PARTITION BY id1 ORDER BY RAND())
)
SELECT * FROM cte WHERE rownum <= ?;

This sorts randomly the rows for each id1, before using the ROW_NUMBER() function to assign row numbers. So the row numbers will be different every time you run the query.

Then you select for the first X rows, which will be a random subset because they're sorted randomly.

  • Related