Home > Enterprise >  How to sample for users and dates within a given timeframe in SQL?
How to sample for users and dates within a given timeframe in SQL?

Time:09-28

I am using Redshift SQL and would like to sample by users-id but I am not sure how to specify that.

Let's say my table looks like this

user_id | date        | other columns
1       | 2020-01-01  | ...
1       | 2020-02-01  | ...
2       | 2020-02-11  | ...
...

How do I filter for 10,000 random user-id & day pairs within 2000-01-01 AND 2020-01-01. How do I do this in SQL?

CodePudding user response:

We can use ROW_NUMBER() with a random ordering:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY RANDOM()) rn
    FROM yourTable
    WHERE date BETWEEN '2000-01-01' AND '2020-01-01'
)

SELECT *
FROM cte
WHERE rn <= 10000;

CodePudding user response:

You can use ORDER BY and LIMIT (or TOP).

select * 
from <table>
order by random()
limit 10000
  • Related