Home > Software design >  Count distinct based on another column
Count distinct based on another column

Time:06-18

I have the following table:

id       date       cid    birth_place    location
1     2022-01-01     1       France        Germany
2     2022-01-30     1       France        France
3     2022-01-25     2       Spain         Spain
4     2022-01-12     3       France        France
5     2022-02-01     4       England       Italy
6     2022-02-12     1       France        France
7     2022-03-05     5       Spain         England
8     2022-03-08     2       Spain         Spain
9     2022-03-15     2       Spain         Spain
10    2022-03-30     5       Spain         Italy
11    2022-03-22     4       England       England
12    2022-03-22     3       France        England

I want to trunc date to month and then group by date and count distinct cid (customer id) based on location with a condition. The condition is if the customer's location within a given month is equal to their birthplace, then they are counted (where birth_place = location). Check my desired output:

   date       location    count
2022-01-01     France       2
2022-01-01     Spain        1
2022-02-01     Italy        1
2022-02-01     France       1
2022-03-01     Spain        1
2022-03-01     England      3

cid 1 in 2022-01-01 had a location = birth_place, and no other customer had Germany as location in that time period, hence there is no Germany in my desired output location.

Edit: last row had an incorrect value, so I fixed it.

Edit: another view to further explain what I want to achieve. Hope this helps.

id       date       cid    birth_place    location   |  (I want to count distinct cids grouped by trunc('month', date) based on these values)
1     2022-01-01     1       France        Germany   |  France (cid 1 has location France in Jan)
2     2022-01-30     1       France        France    |  France 
3     2022-01-25     2       Spain         Spain     |  Spain
4     2022-01-12     3       France        France    |  France
5     2022-02-01     4       England       Italy     |  Italy (cid 4 has only location Italy in Feb and no England, so Italy is taken)
6     2022-02-12     1       France        France    |  France
7     2022-03-05     5       Spain         England   |  England (cid 5 has two locations that are not Spain, so the first one that the query sees is simply taken)
8     2022-03-08     2       Spain         Spain     |  Spain
9     2022-03-15     2       Spain         Spain     |  Spain
10    2022-03-30     5       Spain         Italy     |  England
11    2022-03-22     4       England       England   |  England
12    2022-03-22     3       France        England   |  England

CodePudding user response:

WITH q1 AS (

    SELECT
        EXTRACT( YEAR  FROM t."date" ) AS "Year",
        EXTRACT( MONTH FROM t."date" ) AS "Month",

        t.cid,
        t.birth_place,
        t.location
    FROM
        theTable AS t
    WHERE
        t.location = t.birth_place
)
SELECT
    "Year",
    "Month",
    "location",
    COUNT( DISTINCT cId ) AS "COUNT( DISTINCT cId )",
    COUNT( * ) AS "CountAll"
FROM
    q1
GROUP BY
    "Year",
    "Month",
    "location"
ORDER BY
    "Year",
    "Month",
    "location"

enter image description here

CodePudding user response:

Simply:

SELECT date_trunc('month', date)::date AS date, location
     , count(DISTINCT cid) AS count
FROM   tbl
WHERE  birth_place = location
GROUP  BY 1, 2
ORDER  BY 1, 2;  -- optional

db<>fiddle here

CodePudding user response:

My approach would be using case...when within a count. That way it works with or without a where filter and therefore allows other aggregates of the data in the same query in the future.

SELECT
    date_trunc('month', date)::date AS date, t.location
    , count(distinct (case when t.location=t.birth_place then t.cid else null end)) as "count"
FROM theTable AS t
WHERE t.location=t.birth_place
GROUP BY date_trunc('month', date)::date, t.location
  • Related