I'm wondering if there's a way I can combine these two queries into one? I need to get the mean and std dev for each column in the company_feature
table. I then need to take those two values and use them in an aggregation query on each row in the company_feature
table.
/* Get mean and std dev for each feature column */
SELECT
AVG(F1) AS F1_mean,
STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft;
/* Add averages for each feature to the following query */
SELECT
DATA.company_id,
(
CASE
WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
THEN
1 * ((LN(DATA.F1 1) - :F1_mean) / :F1_std_dev ) * 1
WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
THEN
2 * ((LN(DATA.F1 1) - :F1_mean) / :F1_std_dev ) * 2
WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
THEN
3 * ((LN(DATA.F1 1) - :F1_mean) / :F1_std_dev ) * 3
WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
THEN
4 * ((LN(DATA.F1 1) - :F1_mean) / :F1_std_dev ) * 4
ELSE
5 * ((LN(DATA.F1 1) - :F1_mean) / :F1_std_dev )
END
) AS feature_1
FROM (
SELECT company.in_ref_set, company.size, cft.*
FROM company_feature_test cft
JOIN company ON company.id = cft.company_id
GROUP BY company.id
) AS DATA
GROUP BY DATA.company_id;
the tables look like the following (below). There is a relation between company.id and company_feature.company_id.
company table
| id | ref_set | size |
| -- | --- | --- |
| 1 | 0 | SMALL |
| 2 | 1 | LARGE |
company_feature table
| company_id | F1 | F2 |
| --- | --- | --- |
| 1 | 5 | 10 |
| 2 | 15 | 20 |
The query outputs the following data:
| company_id | feature_1 |
| --- | --- |
| 1 | -1.66 |
| 2 | -1.44 |
CodePudding user response:
Yes, you just cross join them:
SELECT
DATA.company_id,
(
CASE
WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
THEN
1 * ((LN(DATA.F1 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 1
WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
THEN
2 * ((LN(DATA.F1 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 2
WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
THEN
3 * ((LN(DATA.F1 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 3
WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
THEN
4 * ((LN(DATA.F1 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 4
ELSE
5 * ((LN(DATA.F1 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev )
END
) AS feature_1
FROM (
SELECT company.in_ref_set, company.size, cft.*
FROM company_feature_test cft
JOIN company ON company.id = cft.company_id
GROUP BY company.id
) AS DATA
CROSS JOIN (
SELECT
AVG(F1) AS F1_mean,
STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft
) AS TOTALS
Note that there's no need to group by in the outer query; there will already only be one row per company.
Note that you still seem to be doing conditional aggregation incorrectly, if that is what you are trying to do; assuming there are multiple rows in cft for each company, you will be selecting an arbitrary F1 for each company. Default settings in newer versions of mysql will prohibit this.