Home > database >  I can't find the issue with this SQL funtion
I can't find the issue with this SQL funtion

Time:11-07

Here is the code of the (My)SQL function (it works outside of the function), but I can't manage to save it as a function for further reuse...

This is an example of the working query:

SELECT
  (
    SUM(Items_Available * Store_Sales) - (SUM(Items_Available) * SUM(Store_Sales)) / COUNT(*)
  ) / (
    SQRT(
      SUM(Items_Available * Items_Available) - (SUM(Items_Available) * SUM(Items_Available)) / COUNT(*)
    ) * SQRT(
      SUM(Store_Sales * Store_Sales) - (SUM(Store_Sales) * SUM(Store_Sales)) / COUNT(*)
    )
  ) as pearson_r
FROM
  store_sales

I've extracted the business logic into this UDF:

DELIMITER $$
DROP FUNCTION IF EXISTS PEARSON_R $$
CREATE FUNCTION PEARSON_R(X INT, Y INT) RETURNS FLOAT DETERMINISTIC
BEGIN
  
  RETURN (SUM(X * Y) - (SUM(X) * SUM(Y)) / COUNT(*)) / (SQRT(SUM(X * X) - (SUM(X) * SUM(X)) / COUNT(*)) * SQRT(SUM(Y * Y) - (SUM(Y) * SUM(Y)) / COUNT(*)));
END$$
DELIMITER ;

When I try to execute this code in command line, I get this useless error message:

> SELECT PEARSON_R(Items_Available, Store_Sales) FROM store_sales;

ERROR 1111 (HY000): Invalid use of group function

Do you have any idea?

I tried to simplify a lot the function but once I use a group function, I have this error.

CodePudding user response:

While I see nothing wrong with the function implementation per-se, I notice that you are using aggregate function without actually aggregating anything (so you're not using the GROUP BY clause)

Post with a similar error as you: ERROR 1111 (HY000): Invalid use of group function

CodePudding user response:

The main problem is that your function refers to aggregate functions : COUNT, SUM... These are not valid in the scope of the function, that cannot refer, infer, or even assume anything about the context of the calling query.

In my opinion the function code should not even compile (but it does) ; we can reproduce the issue with just:

select PEARSON_R (1, 2);
Error: ER_INVALID_GROUP_FUNC_USE: Invalid use of group function

If you want to move some of the logic to a function, then it would need to take each and every aggregate value as an argument. The signature would look like:

CREATE FUNCTION PEARSON_R(
    SUM_X  INT,     -- SUM(X)
    SUM_Y  INT,     -- SUM(Y)
    SUM_XY INT,     -- SUM(X * Y)
    SUM_XX INT,     -- SUM(X * X)
    SUM_YY INT,     -- SUM(Y * Y)
    CNT    INT      -- COUNT(*)
) RETURNS FLOAT DETERMINISTIC    

Then you would invoke it like so in the query:

SELECT PEARSON_R(
    SUM(X),
    SUM(Y),
    SUM(X * Y),
    SUM(X * X),
    SUM(Y * Y),
    COUNT(*)
) as pearson_r 
FROM store_sales;

Note that you could very well use a subquery (or a lateral join) rather than a function:

SELECT ( sum_xy - sum_x * sum_y) / ...
FROM (
    SELECT PEARSON_R(
        SUM(X) sum_x,
        SUM(Y) sum_y,
        SUM(X * Y) sum_xy,
        SUM(X * X) sum_xx,
        SUM(Y * Y) sum_yy,
        COUNT(*) cnt
    ) as pearson_r 
    FROM store_sales
) t
  • Related