show top 10 occurring values in column-CodePudding

Looking to know how to query the top 10 occurrences of a value in a column .

Schema as follows:

CREATE TABLE Donations (
  id varchar(255) PRIMARY KEY,
  charityId varchar(255) NOT NULL,
  amount integer, 
  createdAt timestamp,
  updatedAt timestamp
);

Imagine this table has 1m records. Looking to know how I can get an output that would return rows that contain the the top 10 charityId's across all records; along with the total number of records that contain that charityId.

Example data would be something like the following:

| id   | charity id  | amount | created at                   | updated at                   |
|------|-------------|--------|------------------------------|------------------------------|
| "1"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "2"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "3"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "4"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "5"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "6"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "7"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "8"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "9"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "10" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "11" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "12" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "13" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "14" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "15" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "16" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "17" | "charity-4" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "18" | "charity-4" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "19" | "charity-5" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "20" | "charity-5" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "21" | "charity-6" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |

An ideal output would be something like:

| charity id  | occurrences                  |
|-------------|------------------------------|
| "charity-1" | 9                            |
| "charity-2" | 4                            |
| "charity-3" | 3                            |
| "charity-4" | 2                            |
| "charity-5" | 2                            |

CodePudding user response：

Since no tiebreaker has been defined, return results "with ties":

SELECT charityid, count(*) AS occurrencs
FROM   donations
GROUP  BY charityid
ORDER  BY occurrencs DESC  -- no tiebeaker?
FETCH  FIRST 10 ROWS WITH TIES;

See:

Get top row(s) with highest value, with ties

And use count(*) here. Faster, and equivalent while no null values are involved.

CodePudding user response：

By use of group by , count :

SELECT charityId , count(charityId )  As Occurrencs 
    from Donations 
    group by charityId 
    Order by Occurrencs Desc , charityId

CodePudding user response：

basic query :

Aggregation by "charityId"
Count rows
Order by count desc
Keep only first 10 reccords

SELECT "charityId" as "charity id", count(*)  As "occurrences" 
FROM "Donations" 
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10