Home > Mobile >  SQL - How to filter out responses with no variation for survery collection to do multi-linear regres
SQL - How to filter out responses with no variation for survery collection to do multi-linear regres

Time:10-26

I'm new to SQL and I am trying to filter out responses with no variation for survery collection (invalid responses) to do multi-linear regression. Do take note that there is actually more than 100 records for this table and I have simplified it for the illustration.

Database: MySQL 8.0.30 : TLSv1.2 (TablePlus)

  • ID is the respondent number.
  • Variables - x1, x2, x3 is the independent variables.
  • Values - Survery response.

For example this is the current table I have:

ID Variables Values
1 x1 1
1 x2 1
1 x3 1
2 x1 2
2 x2 3
2 x3 4
3 x1 5
3 x2 5
3 x3 5

Scripts used: SELECT ID, Variables, Values FROM TableA GROUP BY ID

I am trying to achieve the following table, where I only want to keep the records which have a variation in the responses:

ID Variables Values
2 x1 2
2 x2 3
2 x3 4

I have tried to use the functions WHERE, DISTINCT, WHERE NOT, HAVING, but I can't seem to get the results that I require, or showing blank most times (like the table below). If anyone is able to help, that would be most helpful.

ID Variables Values

Thank you very much!

CodePudding user response:

Your problem has two parts so you are going to need to use a subquery for this.

  1. you want to know which responses have variations. For this you'll want to group by the responses by the id, and then check that the responses that have the same id all have the same value, or not. For this you can select only those having more than one distinct value:
select `id`
from results 
group by `id`
having count(distinct `values`) > 1
  1. based on that you can just wrap it with a select to get all the fields that you want, ungrouped:
select *
from results
where `id` in (
  select `id`
  from results 
  group by `id`
  having count(distinct `values`) > 1
)

This is MySQL syntax, but shouldn't have that many differences for main dbs

SQL Fiddle: http://sqlfiddle.com/#!9/a266f806/4/0

Hope that helps

CodePudding user response:

Try the following:

WITH ids_with_variations as 
(
SELECT ID
      ,COUNT(DISTINCT [Values]) as unique_value_count
FROM TableA
GROUP BY ID
HAVING COUNT(DISTINCT [Values]) = 3 -- this assumes that you expect each ID to have exactly three responses
)

SELECT *
FROM TableA
WHERE ID IN (SELECT ID FROM ids_with_variations)

This is TSQL dialect. This also assumes that you expect exactly three variations in the value column.

  • Related