I am currently learning SQL and ran into a problem. Through my searches I have found that looping in SQL is a big no-no, so I was wondering if anyone could point me into the correct direction?
The dataframe looks like this:
Group | ATP Date | JTH Date |
---|---|---|
A | 5/17/2022 | 6/17/2022 |
A | 5/17/2022 | Null |
B | 5/17/2022 | Null |
A | 5/16/2022 | 6/16/2022 |
B | 5/16/2022 | 6/16/2022 |
B | 5/15/2022 | 6/17/2022 |
B | 5/15/2022 | Null |
A | 5/14/2022 | 6/1/2022 |
A | 5/13/2022 | Null |
A | 5/13/2022 | 6/1/2022 |
A | 5/13/2022 | 6/5/2022 |
I am trying to make a query to pull this:
Date | Group | CountNo | CountYes | Ratio (No/Yes) |
---|---|---|---|---|
5/17/2022 | A | 1 | 1 | 1 |
5/17/2022 | B | 0 | 1 | 0 |
5/16/2022 | A | 1 | 0 | Null |
5/16/2022 | B | 2 | 1 | 2 |
5/14/2022 | A | 1 | 0 | Null |
5/13/2022 | A | 2 | 0 | Null |
This is what I currently created:
max(ATP_Date) as Date,
Group,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) as CountNo,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as CountYes,
sum(
case
when ATP_date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) / sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as ratio
from
dataframe
where group = "A"
GROUP BY group
Which outputs this:
Date | Group | CountNo | CountYes | Ratio |
---|---|---|---|---|
5/17/2022 | A | 318 | 1064 | 0.3 |
This is what I want, but I need to do it for each date for ~ last 4 years, so it looks like the second table posted. I could manually edit the dates for each query, but that would take forever. This made me think of looping. I believe I would basically need to loop through the Select portion with dates, in order to get the output I want. If anyone has advice or could point in me in the correction direction, it would be greatly appreciated, thanks.
CodePudding user response:
I couldn't write the query as a comment, so posting it here. If it's not what you are expecting, let me know will delete this.
Assuming 2022-05-18 is constant (as per your sample)
Creating sample table
create or replace table dt_query
(group string, atp_date date, jth_date date);
insert into dt_query values
('A','2022-05-17','2022-06-17')
,('A','2022-05-17',NULL)
,('B','2022-05-17',NULL)
,('A','2022-05-16','2022-06-16')
,('B','2022-05-16','2022-06-16')
,('B','2022-05-16','2022-06-15')
,('B','2022-05-16',NULL)
Slightly modified your select statement
select
max(ATP_Date) as Date,
Group,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) as CountNo,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as CountYes,
sum(
case
when ATP_date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) / sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as ratio
from
dt_query
GROUP BY group,atp_date
The result matches with what you are expecting.
CodePudding user response:
To avoid using loops here you can include atp_date
into the group by
clause which will result in 1 row for each combination of that date plus the "grouping" column
It isn't clear why you compare to '2022-05-18' but this appears to be the day following the maximum date found in atp_date
. So to avoid hardcoding, you could approach it by using a derived table of 1 row, cross joined to the data:
SELECT
ATP_Date
, grouping
, sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date > cj.max_dt
THEN 1
ELSE 0
END) AS CountNo
, sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date IS NULL
THEN 1
ELSE 0
END) AS CountYes
, sum(CASE
WHEN ATP_date < cj.max_dt AND JTH_Date > cj.max_dt
THEN 1
ELSE 0
END)
/ sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date IS NULL
THEN 1
ELSE NULL
END) AS ratio
FROM dt_query
CROSS JOIN (select max(atp_date) interval '1 day' max_dt from dt_query) AS cj
GROUP BY
grouping
, atp_date
ORDER BY
atp_date DESC
, grouping
atp_date | grouping | countno | countyes | ratio :--------- | :------- | ------: | -------: | ----: 2022-05-17 | A | 1 | 1 | 1 2022-05-17 | B | 0 | 1 | 0 2022-05-16 | A | 1 | 0 | null 2022-05-16 | B | 2 | 1 | 2 2022-05-14 | A | 1 | 0 | null 2022-05-13 | A | 2 | 1 | 2
db<>fiddle here
nb: to avoid issues with the term "group" I have used the column name "grouping" instead, and the example sql is written in postgres so there may be some syntax that needs alteration (e.g. the addition of 1 day). Also note that the ratio
calcuation can result in a divide by zero error so instead of zero I used NULL.