thank you for reading this question.
Im trying to write a SQL statement that should be able to retrieve certain random elements from a table, exactly X elements from all IDs appearing in a set of IDs. Im a MySQL newbie, so the way I am trying to solve this may not be the correct one - in that case, please guide me.
Right now I have the following table:
------------- ------ ------ ----- --------- ----------------
| Field | Type | Null | Key | Default | Extra |
------------- ------ ------ ----- --------- ----------------
| id0 | int | NO | PRI | NULL | auto_increment |
| id1 | int | YES | MUL | NULL | |
| id2 | int | YES | | NULL | |
------------- ------ ------ ----- --------- ----------------
id0 and id2 are unique, and for every id1 we can have multiple id2 related.
My goal is to be able to feed MySQL a set of id1 values, and for each one, I need to retrieve X number of random elements that pertain to that ID (a list of id2 values related to that id1, list is of length X).
I've been reading about windows and using functions with them, so I created a custom function:
DELIMITER //
CREATE FUNCTION get_x_random (
X INT,
id1_in INT
) RETURNS INT
BEGIN
DECLARE id2_return INT;
SELECT id2 INTO id2_return
FROM table
WHERE id1 = id1_in
ORDER BY RAND() LIMIT X;
RETURN id2_return;
END; //
And I am trying to use this function with the following MySQL statement:
SELECT
get_x_random(1, id1) OVER w AS x
FROM table
WHERE id1 IN (
SELECT t2.id1
FROM table AS t2
WHERE t2.id1 IN SET_OF_VALUES
GROUP BY t2.id1
)
WINDOW w AS (PARTITION BY id1);
However, this is not working (Syntax error near OVER w AS ....)
What am I doing wrong? Is it even possible to use custom functions along with windows? Is there any other possible approach to tackle this problem?
Note: I tried iterating with the Python MySQL wrapper over the different id1's doing a simple
SELECT id2
FROM table
WHERE id1 = {str(id1)}
ORDER BY RAND() LIMIT {int(X)}
And works, however, the accumulated fetching of all results makes it too slow (the length of the list of ids is in the order of hundreds), so I would like to find a purely SQL solution.
Thank you so much for reading until this point.
CodePudding user response:
PostGreSQL user here, I hope my answer applies.
Custom function over window
You declare get_x_random as a function
, but over
expects an aggregate
. The engine has no idea how to combine the rows that over
feeds it.
Apparently creating a custom aggregate function in MySQL requires writing C. However I understand that you need a function that takes a set of ids as input, and returns a set of rows - thankfully we won't need a custom aggregate for that.
Fetching n rows by group
"Get n random rows for each of those ids" : this is a surprisingly awkward thing to do in the DBMS I've seen. Topic is covered in SE sites and others :
- Group by every N records in T-SQL
- https://dba.stackexchange.com/questions/86415/retrieving-n-rows-per-group
- https://blogs.oracle.com/sql/post/how-to-select-the-top-n-rows-per-group-with-sql-in-oracle-database
- ... probably many other places
For DBMS providing windowing like modern MySQL, the usual solution goes like :
with pivot as (
select
id1,
row_number() over (
partition by id1 -- within other rows with the same id1...
order by id0 -- which position does this row have ?
)
from _my_table
)
-- within each group, take only rows 1, 2 and 3
select _my_table.*
from _my_table
join pivot using (id1)
where row_number <= 3
For code producing a sample ("fetch random n rows"), I'll carefully evade the subject by citing this post.
Encapsulating
I guess the original intention in your creating a function was to be able to do something like : select * from _my_table, get_x_rows(_my_table.id) where id in (12, 13, 15)
. So get_x_rows
takes an id and produces n rows.
- it's nice to encapsulate the clunky fetch-n-by-group code but...
- make sure the MySQL planner handles the correlated FROM clause properly (applies filter first, uses indices if applicable, etc)
- compare with the convenience and performance of using a view that computes all ids with pre-calculated samples. Again, only if the planner is smart enough to apply outer filters into the view.
CodePudding user response:
In MySQL, I don't think you can make a custom window function. Certainly not with the stored function language. Also, your stored function wouldn't work anyway, because stored functions can only return a scalar, not a result set. There's no support for something like Python's yield
in MySQL stored functions.
This problem can be solved with standard window functions, I think.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER w AS rownum
FROM mytable
WHERE id1 IN (...set of values...)
WINDOW w AS (PARTITION BY id1 ORDER BY RAND())
)
SELECT * FROM cte WHERE rownum <= ?;
This sorts randomly the rows for each id1
, before using the ROW_NUMBER() function to assign row numbers. So the row numbers will be different every time you run the query.
Then you select for the first X rows, which will be a random subset because they're sorted randomly.