I have a transaction table which contains date, transaction_id and category (it is a sales table of clothes). It looks like this:
ndate | transaction_id | category |
---|---|---|
11 | 111 | M |
11 | 111 | F |
11 | 112 | M |
12 | 113 | F |
12 | 113 | O |
13 | 114 | M |
I am applying a sql query to aggregate the data and make a new table to reduce the number of rows (as the data is too huge):
select ndate, category, count(distinct transaction_id)
from transaction_table
group by ndate, category;
The output that I am getting is this:
ndate | category | transactions |
---|---|---|
11 | M | 2 |
11 | F | 1 |
12 | F | 1 |
12 | O | 1 |
13 | M | 1 |
As you can see from the transaction_table there have been only 4 transactions but in the output_table the total count of transaction is 6 which is wrongly been done due to the group by of category.
This is the output I am trying to get, I am not sure if a table like this can be made in sql (I am new to sql) :
CodePudding user response:
You haven't told us what you expect, but I think from reading your question that you need to GROUP BY
category, but you only want to count a transaction_id one time if it happened to exist in 2 categories.
We can't have it both ways, so we have to choose - if we list the category in GROUP BY
then we have to decide how to count a transaction that happens in 2 categories.
Option 1: Choose "first" category (shown alphabetically)
WITH CTE_DEDUPE AS (
SELECT
*
FROM
transaction_table QUALIFY ROW_NUMBER() OVER (
PARTITION BY TRANSACTION_ID
ORDER BY
TRANSACTION_DT,
CATEGORY ASC
) = 1
),
BASIC_AGGS AS (
SELECT
TRANSACTION_DT,
CATEGORY,
COUNT(DISTINCT TRANSACTION_ID) as TRANSACTION_ID_COUNTDISTINCT,
COUNT(1) AS AGG_ROW_COUNT
FROM
CTE_DEDUPE
GROUP BY
TRANSACTION_DT,
CATEGORY
)
SELECT
BASIC_AGGS.TRANSACTION_DT,
BASIC_AGGS.CATEGORY,
BASIC_AGGS.TRANSACTION_ID_COUNTDISTINCT
FROM
BASIC_AGGS
Result:
TRANSACTION_DT | CATEGORY | TRANSACTION_ID_COUNTDISTINCT |
---|---|---|
11 | F | 1 |
12 | F | 1 |
13 | M | 1 |
There are other options but I will hold off until you tell us some more information about what you expect, and what specific instance of SQL you are running. I generated the SQL with