Home > Software design >  Subquery with order by random does not work on database with 500.000.000 rows
Subquery with order by random does not work on database with 500.000.000 rows

Time:08-04

I want to perform an update query on a offline database with DB Browser for SQLite.

I tested my query on a few rows and its working perfectly there, but not with my database which has 500.000.000 rows . It looks like the random subquery is not executed at all there and the first rows of the group by are taken.

The query:

UPDATE  
table
SET typ = 3 WHERE id IN (
  SELECT id FROM (
    SELECT * FROM table ORDER BY RANDOM()
  )  
  WHERE typ = 1 GROUP BY idg HAVING COUNT(idg) > 5
)

Sample data:

id |idg| typ
1  | 1 | 1
2  | 1 | 1
3  | 1 | 1
4  | 1 | 1
5  | 1 | 1
6  | 1 | 1
7  | 1 | 1
8  | 2 | 1
9  | 2 | 1
10 | 2 | 1
11 | 2 | 1
12 | 2 | 1
13 | 2 | 1
14 | 2 | 1
15 | 2 | 1

Is there any fix or workaround to execute my query successfully ?

CodePudding user response:

If your version of SQLite is 3.33.0 , you can use the UPDATE ... FROM... syntax, so that you can join to the table a query that uses window function ROW_NUMBER() to check if a specific idg has more than 5 rows and returns a random id:

WITH cte AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
  FROM tablename
  WHERE typ = 1
)
UPDATE tablename AS t
SET typ = 3 
FROM cte AS c
WHERE t.id = c.id AND c.rn = 6; -- rn = 6 makes sure that there are at least 6 rows

See the demo.

For SQLite 3.25.0 use the operator IN with ROW_NUMBER() window function:

UPDATE tablename
SET typ = 3 
WHERE id IN (
  SELECT id 
  FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
    FROM tablename
    WHERE typ = 1
  )
  WHERE rn = 6 -- rn = 6 makes sure that there are at least 6 rows 
);

See the demo.

  • Related