I need to find most frequent value of object_of_search
for each ethnicity. How can I achieve this? Subqueries in the SELECT
clause and correlated subqueries are not allowed. Something similar to this:
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
But this does not aggregate and gives me many rows for each ethnicity and object_of_search:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
--------------------------- ------------------- ------------------ --------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
ethnicity3 | 2 | 100 | Firearms
ethnicity1 | 5 | 60 | Cat
ethnicity1 | 5 | 60 | Dog
ethnicity2 | 3 | 66.6666666666667 | Firearms
ethnicity1 | 5 | 60 | Psychoactive substances
ethnicity1 | 5 | 60 | Fireworks
And should be something like this:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
--------------------------- ------------------- ------------------ --------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
Table on fiddle.
Query:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;
Table:
CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);
CodePudding user response:
Updated: Fiddle
This should address the specific "which object per ethnicity" question.
Note, this doesn't address ties in the count. That wasn't part of the question / request.
Adjust your SQL to include this logic, to provide that detail:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
Result:
officer_defined_ethnicity | object_of_search | n | rn |
---|---|---|---|
ethnicity1 | Cat | 1 | 1 |
ethnicity2 | Stolen goods | 2 | 1 |
ethnicity3 | Fireworks | 1 | 1 |
CodePudding user response:
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY 1, 2
ORDER BY 1, 3 DESC, 2;
Or more explicitly:
SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY officer_defined_ethnicity, object_of_search
ORDER BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
--------------------------- ------------------ ----
ethnicity1 | Cat | 1
ethnicity2 | Stolen goods | 2
ethnicity3 | Firearms | 1
db<>fiddle here
Since DISTINCT ON
is applied after GROUP BY
we only need a single query level.
- Aggregate to get counts per
(officer_defined_ethnicity, object_of_search)
withGROUP BY
. - Pick the row with the highest count per
officer_defined_ethnicity
withDISTINCT ON
.
I added object_of_search
as third ORDER BY
item to act as tiebreaker and produce a deterministic result:
In case of ties, pick the earliest object_of_search
according to alphabetical sort order.
Adapt to your needs.
See:
Simpler and typically faster than a subquery with row_number()
: