I have a table that looks like the following
V1 | Group | time
----------------------
a | g1 | 1
b | g1 | 2
c | g1 | 3
d | g2 | 1
e | g2 | 2
f | g2 | 3
g | g3 | 5
h | g3 | 6
i | g3 | 7
I wish to filter only a sample Group
s, but take all of the associated examples with the filtered groups. For example, assuming that the desired sample size is 2 groups
, and (g1,g3)
were sampled, the desired table will be
V1 | Group | time
----------------------
a | g1 | 1
b | g1 | 2
c | g1 | 3
g | g3 | 5
h | g3 | 6
i | g3 | 7
CodePudding user response:
I think this is what you want.
Note that I renamed group
to g
because it is usually a reserved SQL Keyword and that you may have to use another rand function depending on your DB (I assume SQL Server here).
drop table if exists #have;
CREATE TABLE #have
(
V1 VARCHAR(3) NOT NULL
, g VARCHAR(3)
, time INT
);
insert into #have
values
('a', 'g1', 1)
, ('b', 'g1', 2)
, ('c', 'g1', 3)
, ('d', 'g2', 1)
, ('e', 'g2', 2)
, ('f', 'g2', 3)
, ('g', 'g3', 5)
, ('h', 'g3', 6)
, ('i', 'g3', 7)
;
select a.*
from #have a join
(
select top 2 g
from #have
group by g
order by newid()
) b
on a.g = b.g;
CodePudding user response:
A simple query like this would give you the results you want:
SELECT * FROM table WHERE Group IN('g1','g3');
Alternative:
SELECT * FROM table WHERE Group = 'g1' OR Group = 'g3';
Note GROUP
is a reserved word in most RDBMSs so you probably need to escape them. In MySQL/MariaDB you can use backticks:
SELECT * FROM `table` WHERE `Group` IN('g1','g3');
If ANSI SQL mode is enabled you can use double quotes :
SELECT * FROM "table" WHERE "Group" IN('g1','g3');
In T-SQL you can use brackets:
SELECT * FROM [table] WHERE [Group] IN('g1','g3');
If you want a randomized sample of two groups, and use SQL Server
you should go with PeterClemmensens excellent answer
With a few changes in syntax his solution works well also for MySQL/MariaDB 5.5 and above:
SELECT a.*
FROM `table` a JOIN (
SELECT `Group`
FROM `table`
GROUP BY `Group`
ORDER BY RAND()
LIMIT 2
) b
ON a.`Group` = b.`Group`;