Home > Software design >  Create correlation function in MySQL 5
Create correlation function in MySQL 5

Time:12-09

I am trying to create a correlation function in MySql 5 that takes in two columns (x, y) and returns a single Decimal value (correlation_coefficient).

Here is my function definition

-- Calculate pearson correlation coefficient.
-- INPUT: X and Y should be columns of data (decimal) 
-- RETURN: A value between -1 and 1 depending on the strength of the relationship between the 2 columns e.g. 0.43.

DELIMITER $$

CREATE FUNCTION PearsonCorrelation(
    x Decimal(10,1),
    y Decimal(10,1)
)
RETURNS Decimal(10,1)
DETERMINISTIC
BEGIN
    DECLARE correlation_coefficient  DECIMAL(3,2);
    SET correlation_coefficient = (avg(x * y) - avg(x) * avg(y)) / (sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)));
    RETURN(correlation_coefficient);
END $$

DELIMITER ;

However, when I execute the function call, I get the error 'invalid use of group function'. Here are some test data, where the correlation coefficient returned from the dataset should be 0.86.

CREATE TABLE data_table
(
x Decimal(3,1) NOT NULL,
y Decimal(3,1) NOT NULL
)
INSERT INTO data_table 
VALUES(11.2, 10.4),
(9.7, 4.6),
(4.5, 2.1)

I would intend to call this function as follows:

Select PearsonCorrelation(x,y) as corrcoef
FROM data_table

Given feedback, my question may be reframed to say, is it possible to pass in table columns as arguments in to the correlation function and if so, how do I adapt the function to achieve this?

CodePudding user response:

You're trying to pass columns of table data to a MySQL stored function. You Can't Do That™. Aggregate functions like AVG() only work in the context of SQL statements like SELECT -- statements that mention FROM sometable.

You could try something like this.

 SELECT (AVG(x * y) - AVG(x) * AVG(y)) /
        (SQRT(AVG(x * x) - AVG(x) * AVG(x)) *
         SQRT(AVG(y * y) - AVG(y) * AVG(y))) correlation_coefficient
  FROM data_table;

You should know that MySQL translates all numbers to double-precision floating point before doing arithmetic on them. DECIMAL(3,1) is probably not a good choice of data format.

Some other makes and versions of table server let you create your own aggregate functions, but not MariaDB / MySQL.

CodePudding user response:

AVG() Function has no meaning here..
only in a select statement like this, where it will give you the average of values.

SELECT * FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);

just remove all AVG() from the code

  • Related