Looking to know how to query the top 10 occurrences of a value in a column .
Schema as follows:
CREATE TABLE Donations (
id varchar(255) PRIMARY KEY,
charityId varchar(255) NOT NULL,
amount integer,
createdAt timestamp,
updatedAt timestamp
);
Imagine this table has 1m records. Looking to know how I can get an output that would return rows that contain the the top 10 charityId's across all records; along with the total number of records that contain that charityId.
Example data would be something like the following:
| id | charity id | amount | created at | updated at |
|------|-------------|--------|------------------------------|------------------------------|
| "1" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "2" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "3" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "4" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "5" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "6" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "7" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "8" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "9" | "charity-1" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "10" | "charity-2" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "11" | "charity-2" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "12" | "charity-2" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "13" | "charity-2" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "14" | "charity-3" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "15" | "charity-3" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "16" | "charity-3" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "17" | "charity-4" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "18" | "charity-4" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "19" | "charity-5" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "20" | "charity-5" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "21" | "charity-6" | 100 | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
An ideal output would be something like:
| charity id | occurrences |
|-------------|------------------------------|
| "charity-1" | 9 |
| "charity-2" | 4 |
| "charity-3" | 3 |
| "charity-4" | 2 |
| "charity-5" | 2 |
CodePudding user response:
Since no tiebreaker has been defined, return results "with ties":
SELECT charityid, count(*) AS occurrencs
FROM donations
GROUP BY charityid
ORDER BY occurrencs DESC -- no tiebeaker?
FETCH FIRST 10 ROWS WITH TIES;
See:
And use count(*)
here. Faster, and equivalent while no null
values are involved.
CodePudding user response:
By use of group by , count :
SELECT charityId , count(charityId ) As Occurrencs
from Donations
group by charityId
Order by Occurrencs Desc , charityId
CodePudding user response:
basic query :
- Aggregation by "charityId"
- Count rows
- Order by count desc
- Keep only first 10 reccords
SELECT "charityId" as "charity id", count(*) As "occurrences"
FROM "Donations"
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10