Home > Software engineering >  Can these two SQL queries be combined?
Can these two SQL queries be combined?

Time:01-24

I'm wondering if there's a way I can combine these two queries into one? I need to get the mean and std dev for each column in the company_feature table. I then need to take those two values and use them in an aggregation query on each row in the company_feature table.

/* Get mean and std dev for each feature column */
SELECT 
    AVG(F1) AS F1_mean, 
    STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft;
/* Add averages for each feature to the following query */
SELECT
    DATA.company_id,
      (
       CASE
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
         THEN 
           1 * ((LN(DATA.F1   1) - :F1_mean) / :F1_std_dev ) * 1
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
         THEN 
           2 * ((LN(DATA.F1   1) - :F1_mean) / :F1_std_dev ) * 2           
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
         THEN 
           3 * ((LN(DATA.F1   1) - :F1_mean) / :F1_std_dev ) * 3   
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
         THEN 
           4 * ((LN(DATA.F1   1) - :F1_mean) / :F1_std_dev ) * 4
         ELSE
           5 * ((LN(DATA.F1   1) - :F1_mean) / :F1_std_dev )
       END
      ) AS feature_1
FROM (
  SELECT company.in_ref_set, company.size, cft.*
  FROM company_feature_test cft
  JOIN company ON company.id = cft.company_id
  GROUP BY company.id
) AS DATA
GROUP BY DATA.company_id;

the tables look like the following (below). There is a relation between company.id and company_feature.company_id.

company table

| id | ref_set | size |
| -- | --- | --- |
| 1  | 0   | SMALL |
| 2  | 1   | LARGE |


company_feature table

| company_id | F1 | F2 |
| --- | --- | --- |
| 1   | 5   | 10  |
| 2   | 15  | 20  |

The query outputs the following data:

| company_id | feature_1 |
| --- | --- | 
| 1   | -1.66 |
| 2   | -1.44 | 

CodePudding user response:

Yes, you just cross join them:

SELECT
    DATA.company_id,
      (
       CASE
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
         THEN 
           1 * ((LN(DATA.F1   1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 1
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
         THEN 
           2 * ((LN(DATA.F1   1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 2           
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
         THEN 
           3 * ((LN(DATA.F1   1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 3   
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
         THEN 
           4 * ((LN(DATA.F1   1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 4
         ELSE
           5 * ((LN(DATA.F1   1) - TOTALS.F1_mean) / TOTALS.F1_std_dev )
       END
      ) AS feature_1
FROM (
  SELECT company.in_ref_set, company.size, cft.*
  FROM company_feature_test cft
  JOIN company ON company.id = cft.company_id
  GROUP BY company.id
) AS DATA
CROSS JOIN (
  SELECT 
    AVG(F1) AS F1_mean, 
    STDDEV(F1) AS F1_std_dev
  FROM company_feature_test cft
) AS TOTALS

Note that there's no need to group by in the outer query; there will already only be one row per company.

Note that you still seem to be doing conditional aggregation incorrectly, if that is what you are trying to do; assuming there are multiple rows in cft for each company, you will be selecting an arbitrary F1 for each company. Default settings in newer versions of mysql will prohibit this.

  • Related