I have a table with three columns:
- student_id, int(6)
- start_yearmonthdate, varchar(30)
- end_yearmonthdate, varchar(30)
Each student's enrollment period is provided by start_yearmonthdate and end_yearmonthdate. Where the values are in the yymmdd format. Also, if a student has not specified an end date yet, the end_yearmonthdate will show '000000'. You can consider the timeframe is in the year of 2022.
I want to know for each month, how many students are enrolled. For example, for student 100001, Jan, Feb, and Mar are all the enrollment months for this student.
Here is the table:
student_id | start_yearmonthdate | end_yearmonthdate |
---|---|---|
100001 | 220101 | 220331 |
100002 | 220201 | 220731 |
100003 | 220101 | 221231 |
100004 | 220701 | 221031 |
100005 | 220601 | 220930 |
100006 | 220101 | 221231 |
100006 | 220201 | 221231 |
100007 | 220701 | 000000 |
100008 | 220601 | 000000 |
You can use sqlfiddle.com to build the table:
create table training (
student_id int(6),
start_yearmonthdate varchar(30),
end_yearmonthdate varchar(30)
);
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100001, '220101', '220331');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100002, '220201', '220731');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100003, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100004, '220701', '221031');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100005, '220601', '220930');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220201', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100007, '220701', '000000');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100008, '220601', '000000');
CodePudding user response:
MySQL has no time series, so you must generate one to get to the data, which is needed to group the data.
Still every STR_TO_DATE
will cost lots of performace.
WITH RECURSIVE t_series as ( select '2022-01-01' as dt UNION SELECT DATE_ADD(t_series.dt, INTERVAL 1 DAY) FROM t_series WHERE DATE_ADD(t_series.dt, INTERVAL 1 DAY) <= '2022-12-31' ) select COUNT(DISTINCT t2.student_id) ,EXTRACT(YEAR_MONTH FROM t1.dt) FROM t_series t1 JOIN training t2 ON t1.dt BETWEEN STR_TO_DATE(t2.start_yearmonthdate,'%y%m%d') AND STR_TO_DATE(IF(t2.end_yearmonthdate = '000000','221231',t2.end_yearmonthdate),'%y%m%d') GROUP BY EXTRACT(YEAR_MONTH FROM t1.dt);
COUNT(DISTINCT t2.student_id) | EXTRACT(YEAR_MONTH FROM t1.dt) ----------------------------: | -----------------------------: 3 | 202201 4 | 202202 4 | 202203 3 | 202204 3 | 202205 5 | 202206 7 | 202207 6 | 202208 6 | 202209 5 | 202210 4 | 202211 4 | 202212
db<>fiddle here
CodePudding user response:
There are lots of steps involved here, including creating a "months calendar" table using recursive CTE.
I tried my best to make each step clear by using expressive aliases and CTE names; However, you might need to go over each step thoroughly.
WITH Step_0_create_months_calendar AS
(
SELECT CAST('2022-01-01' AS date) AS month_1st_day
UNION ALL
SELECT DATEADD(MONTH, 1, month_1st_day)
FROM Step_0_create_months_calendar
WHERE month_1st_day < '2024-01-01'
)
, Step_1_real_dates AS
(
SELECT student_id
, CAST('20' start_yearmonthdate AS date) AS start_yearmonthdate
, CAST(CASE
WHEN end_yearmonthdate = '000000' THEN GETDATE()
ELSE '20' end_yearmonthdate
END AS date) AS end_yearmonthdate
FROM training
)
, step_2_explode_months_per_student AS
(
SELECT *
FROM Step_1_real_dates AS S1
INNER JOIN Step_0_create_months_calendar AS clndr
ON clndr.month_1st_day BETWEEN S1.start_yearmonthdate AND S1.end_yearmonthdate
)
, Step_3_just_student_year_and_month AS
(
SELECT student_id, YEAR(month_1st_day) as year_num ,MONTH(month_1st_day) AS month_num
FROM step_2_explode_months_per_student
)
-- Last step: Aggregating
SELECT year_num, month_num, COUNT(DISTINCT student_id) AS num_of_students
FROM Step_3_just_student_year_and_month
GROUP BY year_num, month_num
ORDER BY year_num, month_num