Home > Back-end >  SQL - use start and end date to calculate number of students enrolled
SQL - use start and end date to calculate number of students enrolled

Time:08-02

I have a table with three columns:

  • student_id, int(6)
  • start_yearmonthdate, varchar(30)
  • end_yearmonthdate, varchar(30)

Each student's enrollment period is provided by start_yearmonthdate and end_yearmonthdate. Where the values are in the yymmdd format. Also, if a student has not specified an end date yet, the end_yearmonthdate will show '000000'. You can consider the timeframe is in the year of 2022.

I want to know for each month, how many students are enrolled. For example, for student 100001, Jan, Feb, and Mar are all the enrollment months for this student.

Here is the table:

student_id start_yearmonthdate end_yearmonthdate
100001 220101 220331
100002 220201 220731
100003 220101 221231
100004 220701 221031
100005 220601 220930
100006 220101 221231
100006 220201 221231
100007 220701 000000
100008 220601 000000

You can use sqlfiddle.com to build the table:

create table training (
  student_id int(6),
  start_yearmonthdate varchar(30),
  end_yearmonthdate varchar(30)
);

insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100001, '220101', '220331');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100002, '220201', '220731');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100003, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100004, '220701', '221031');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100005, '220601', '220930');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220201', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100007, '220701', '000000');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100008, '220601', '000000');

CodePudding user response:

MySQL has no time series, so you must generate one to get to the data, which is needed to group the data.

Still every STR_TO_DATE will cost lots of performace.

WITH RECURSIVE t_series as (
    select '2022-01-01' as dt
  UNION
    SELECT DATE_ADD(t_series.dt, INTERVAL 1 DAY) FROM t_series WHERE DATE_ADD(t_series.dt, INTERVAL 1 DAY) <= '2022-12-31'
)
select 

COUNT(DISTINCT t2.student_id)
,EXTRACT(YEAR_MONTH FROM t1.dt)
FROM t_series t1 JOIN training t2 
ON t1.dt BETWEEN STR_TO_DATE(t2.start_yearmonthdate,'%y%m%d') 
AND STR_TO_DATE(IF(t2.end_yearmonthdate = '000000','221231',t2.end_yearmonthdate),'%y%m%d')
GROUP BY EXTRACT(YEAR_MONTH FROM t1.dt);
COUNT(DISTINCT t2.student_id) | EXTRACT(YEAR_MONTH FROM t1.dt)
----------------------------: | -----------------------------:
                            3 |                         202201
                            4 |                         202202
                            4 |                         202203
                            3 |                         202204
                            3 |                         202205
                            5 |                         202206
                            7 |                         202207
                            6 |                         202208
                            6 |                         202209
                            5 |                         202210
                            4 |                         202211
                            4 |                         202212

db<>fiddle here

CodePudding user response:

There are lots of steps involved here, including creating a "months calendar" table using recursive CTE.

I tried my best to make each step clear by using expressive aliases and CTE names; However, you might need to go over each step thoroughly.

WITH Step_0_create_months_calendar AS
    (
    SELECT CAST('2022-01-01' AS date) AS month_1st_day
    UNION ALL
    SELECT DATEADD(MONTH, 1, month_1st_day)
    FROM Step_0_create_months_calendar
    WHERE month_1st_day < '2024-01-01'
    )
, Step_1_real_dates AS
    (
    SELECT student_id
    , CAST('20'   start_yearmonthdate AS date) AS start_yearmonthdate
    , CAST(CASE
            WHEN end_yearmonthdate = '000000' THEN GETDATE()
            ELSE '20'   end_yearmonthdate
        END AS date) AS end_yearmonthdate
    FROM training
    )
, step_2_explode_months_per_student AS
    (
    SELECT *
    FROM Step_1_real_dates AS S1
    INNER JOIN Step_0_create_months_calendar AS clndr
    ON clndr.month_1st_day BETWEEN S1.start_yearmonthdate AND S1.end_yearmonthdate
    )
, Step_3_just_student_year_and_month AS
    (
    SELECT student_id, YEAR(month_1st_day) as year_num ,MONTH(month_1st_day) AS month_num
    FROM step_2_explode_months_per_student
    )
-- Last step: Aggregating
SELECT year_num, month_num, COUNT(DISTINCT student_id) AS num_of_students
FROM Step_3_just_student_year_and_month
GROUP BY year_num, month_num
ORDER BY year_num, month_num
  • Related