Home > Software engineering >  Finding rows which match the average value of a column
Finding rows which match the average value of a column

Time:02-11

I'm using Oracle 11g. I have a couple of tables which look something like this (simplified a bit).

CREATE TABLE clients (
    id NUMBER(10) NOT NULL,
    gridx NUMBER(10),
    gridy NUMBER(10),
    CONSTRAINT clients_pk PRIMARY KEY (id)
);

CREATE TABLE requests (
    id NUMBER(10) NOT NULL,
    client_id NUMBER(10),
    CONSTRAINT clients_fk FOREIGN KEY (client_id) REFERENCES clients(id),
    CONSTRAINT requests_pk PRIMARY KEY (id)
);

I need to find the grid sections which have seen the average number of requests. So far, I have come up with the following query which lists the number of requests made for each distinct grid location.

SELECT joined_tbl.gridx, joined_tbl.gridy, COUNT(joined_tbl.id) requests_cnt FROM (
    SELECT c.gridx, c.gridy, r.id
    FROM requests r
    INNER JOIN clients c ON c.id=r.client_id GROUP BY c.gridx, c.gridy, r.id ORDER BY r.id
) joined_tbl
GROUP BY joined_tbl.gridx, joined_tbl.gridy;

This gives the following output

 GRIDX      GRIDY REQUESTS_CNT
     1         -3            2
     2          5            4
    -1         -3            4
    -3         -2            6

Next I need to take the average of the REQUESTS_CNT column and list all the rows which match the average value. How can I go about this? I can't use the AVG function in a WHERE clause, which I considered, so should I be using HAVING?

CodePudding user response:

Your query can be boiled down to:

SELECT c.gridx, c.gridy, COUNT(*) AS requests_cnt
FROM requests r 
JOIN clients c ON c.id = r.client_id 
GROUP BY c.gridx, c.gridy;

i.e. get the request count per gridx/gridy. You want to determine the average count and then only show gridx/gridy pairs that occur exactly that often. In your example this would be the pairs (2|5), (-1|-3), but most often, I guess, that would be no pairs at all.

The easiest approach seems to be to get the average on-the-fly by applying AVG OVER:

SELECT gridx, gridy
FROM
(
  SELECT 
    c.gridx, c.gridy, COUNT(*) AS requests_cnt,
    AVG(COUNT(*)) OVER () AS avg_requests_cnt
  FROM requests r 
  JOIN clients c ON c.id = r.client_id 
  GROUP BY c.gridx, c.gridy
) pairs
WHERE requests_cnt = avg_requests_cnt
ORDER BY gridx, gridy;
  • Related