I tried to calculate the average in two different ways. The result should be the same, but mysql gives different result.
data dummy
CREATE TABLE `test_avg` (
`dt` varchar(10) NOT NULL,
`field1` double NOT NULL,
`field2` double NOT NULL,
`field3` varchar(2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `test_avg` (`dt`, `field1`, `field2`, `field3`) VALUES
('2022-10-31', 16.1379, 13.0809, 'A'),
('2022-10-31', 12.7579, 0.4458, 'A'),
('2022-10-31', 7.9206, 2.7775, 'A'),
('2022-10-31', 6.3764, 2.666, 'A'),
('2022-10-31', 5.3136, 1.6478, 'A'),
('2022-10-31', 4.5103, 88.178, 'A'),
('2022-10-31', 4.3547, 7.813, 'A'),
('2022-10-31', 4.3542, 3.5463, 'A'),
('2022-10-31', 3.0554, 7.3114, 'A'),
('2022-10-31', 26.3792, 2.2424, 'B'),
('2022-10-31', 9.6861, 28.5324, 'B'),
('2022-10-31', 9.1814, 6.8606, 'B'),
('2022-10-31', 8.0094, 6.2568, 'B'),
('2022-10-31', 7.5882, 548.5715, 'B'),
('2022-10-31', 7.5301, 3.7209, 'B'),
('2022-10-31', 7.4933, 1.3494, 'B'),
('2022-10-31', 7.4388, 22.8762, 'B'),
('2022-10-31', 7.1385, 19.9597, 'B'),
('2022-10-31', 7.1196, 19.8701, 'B');
query1
SELECT dt, AVG(field1), AVG(field2)
FROM test_avg
GROUP BY dt
query2
SELECT a.dt, AVG(a.avg1), AVG(a.avg2)
FROM
(SELECT dt, AVG(field1)AS avg1, AVG(field2)AS avg2, field3
FROM test_avg
GROUP BY dt, field3)a
GROUP BY a.dt
The result should be the same, but mysql gives different result.
CodePudding user response:
An average of averages is different from the average of the individual figures as soon as subsample sizes differ:
(10 20 60 50 25) / 5 = 33
But:
(10 20 60) / 3 = 30
(50 25) / 2 = 37.5
(30 37.5) / 2 = 33.75
The reason is that the weight of original values is being changed in the process.