I'm new to SQL and I am trying to filter out responses with no variation for survery collection (invalid responses) to do multi-linear regression. Do take note that there is actually more than 100 records for this table and I have simplified it for the illustration.
Database: MySQL 8.0.30 : TLSv1.2 (TablePlus)
- ID is the respondent number.
- Variables - x1, x2, x3 is the independent variables.
- Values - Survery response.
For example this is the current table I have:
ID | Variables | Values |
---|---|---|
1 | x1 | 1 |
1 | x2 | 1 |
1 | x3 | 1 |
2 | x1 | 2 |
2 | x2 | 3 |
2 | x3 | 4 |
3 | x1 | 5 |
3 | x2 | 5 |
3 | x3 | 5 |
Scripts used: SELECT ID, Variables, Values FROM TableA GROUP BY ID
I am trying to achieve the following table, where I only want to keep the records which have a variation in the responses:
ID | Variables | Values |
---|---|---|
2 | x1 | 2 |
2 | x2 | 3 |
2 | x3 | 4 |
I have tried to use the functions WHERE, DISTINCT, WHERE NOT, HAVING, but I can't seem to get the results that I require, or showing blank most times (like the table below). If anyone is able to help, that would be most helpful.
ID | Variables | Values |
---|
Thank you very much!
CodePudding user response:
Your problem has two parts so you are going to need to use a subquery for this.
- you want to know which responses have variations. For this you'll want to group by the responses by the
id
, and then check that the responses that have the sameid
all have the samevalue
, or not. For this you can select only those having more than one distinct value:
select `id`
from results
group by `id`
having count(distinct `values`) > 1
- based on that you can just wrap it with a
select
to get all the fields that you want, ungrouped:
select *
from results
where `id` in (
select `id`
from results
group by `id`
having count(distinct `values`) > 1
)
This is MySQL syntax, but shouldn't have that many differences for main dbs
SQL Fiddle: http://sqlfiddle.com/#!9/a266f806/4/0
Hope that helps
CodePudding user response:
Try the following:
WITH ids_with_variations as
(
SELECT ID
,COUNT(DISTINCT [Values]) as unique_value_count
FROM TableA
GROUP BY ID
HAVING COUNT(DISTINCT [Values]) = 3 -- this assumes that you expect each ID to have exactly three responses
)
SELECT *
FROM TableA
WHERE ID IN (SELECT ID FROM ids_with_variations)
This is TSQL dialect. This also assumes that you expect exactly three variations in the value
column.