I am trying to create a correlation function in MySql 5 that takes in two columns (x, y) and returns a single Decimal value (correlation_coefficient).
Here is my function definition
-- Calculate pearson correlation coefficient.
-- INPUT: X and Y should be columns of data (decimal)
-- RETURN: A value between -1 and 1 depending on the strength of the relationship between the 2 columns e.g. 0.43.
DELIMITER $$
CREATE FUNCTION PearsonCorrelation(
x Decimal(10,1),
y Decimal(10,1)
)
RETURNS Decimal(10,1)
DETERMINISTIC
BEGIN
DECLARE correlation_coefficient DECIMAL(3,2);
SET correlation_coefficient = (avg(x * y) - avg(x) * avg(y)) / (sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)));
RETURN(correlation_coefficient);
END $$
DELIMITER ;
However, when I execute the function call, I get the error 'invalid use of group function'. Here are some test data, where the correlation coefficient returned from the dataset should be 0.86.
CREATE TABLE data_table
(
x Decimal(3,1) NOT NULL,
y Decimal(3,1) NOT NULL
)
INSERT INTO data_table
VALUES(11.2, 10.4),
(9.7, 4.6),
(4.5, 2.1)
I would intend to call this function as follows:
Select PearsonCorrelation(x,y) as corrcoef
FROM data_table
Given feedback, my question may be reframed to say, is it possible to pass in table columns as arguments in to the correlation function and if so, how do I adapt the function to achieve this?
CodePudding user response:
You're trying to pass columns of table data to a MySQL stored function. You Can't Do That™. Aggregate functions like AVG() only work in the context of SQL statements like SELECT -- statements that mention FROM sometable.
You could try something like this.
SELECT (AVG(x * y) - AVG(x) * AVG(y)) /
(SQRT(AVG(x * x) - AVG(x) * AVG(x)) *
SQRT(AVG(y * y) - AVG(y) * AVG(y))) correlation_coefficient
FROM data_table;
You should know that MySQL translates all numbers to double-precision floating point before doing arithmetic on them. DECIMAL(3,1)
is probably not a good choice of data format.
Some other makes and versions of table server let you create your own aggregate functions, but not MariaDB / MySQL.
CodePudding user response:
AVG()
Function has no meaning here..
only in a select statement like this, where it will give you the average of values.
SELECT * FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
just remove all AVG() from the code