Home > Enterprise >  Converting subquery into window function
Converting subquery into window function

Time:02-05

I had an interview where they asked me: "Find the most popular class among students using their first enrollment date", where the assumption was that a student would pick their favorite class first. For simplicity, no two EnrollmentDT could be exactly the same and there are no data issues (e.g. a student can't be enrolled in the same class twice).

They expected me to use a window function, and I'm curious how to do that for this problem.

I quickly setup some seed data as such (I'm aware the seed portion isn't a perfect representation, but I needed something close enough quickly):

IF OBJECT_ID('StudentClass') IS NOT NULL DROP TABLE StudentClass;
IF OBJECT_ID('Class') IS NOT NULL DROP TABLE Class;
IF OBJECT_ID('Student') IS NOT NULL DROP TABLE Student;

CREATE TABLE Student (
    StudentID INT IDENTITY(1,1) PRIMARY KEY,
    [Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
);

CREATE TABLE Class (
    ClassID INT IDENTITY(1,1) PRIMARY KEY,
    [Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
    ClassLevel INT DEFAULT CAST(CEILING(RAND() * 3) AS INT)
);

CREATE TABLE StudentClass (
    StudentClassID INT IDENTITY(1,1),
    StudentID INT FOREIGN KEY REFERENCES Student (StudentID),
    ClassID INT FOREIGN KEY REFERENCES Class (ClassID),
    EnrollmentDT DATETIME2
);
GO

INSERT INTO Student DEFAULT VALUES
GO 50

INSERT INTO Class DEFAULT VALUES
GO 5

DECLARE @StudentIndex INT = 1;
DECLARE @Cycle INT = 1;
WHILE @Cycle <= 5
    BEGIN
        IF RAND() > 0.5
            BEGIN
                INSERT INTO StudentClass (StudentID, ClassID, EnrollmentDT)
                VALUES
                    (@StudentIndex, @Cycle, DATEADD(SECOND, CAST(CEILING(RAND() * 10000) AS INT), SYSDATETIME()))
            END

        SET @StudentIndex = @StudentIndex   1;

        IF @StudentIndex = 50
            BEGIN
                SET @Cycle = @Cycle   1;
                SET @StudentIndex = 1;
            END
    END

But the only thing I could come up with was:

SELECT
    sc.ClassID,
    COUNT(*) AS IsFavoriteClassCount
FROM
    StudentClass sc
    INNER JOIN (
        SELECT
            StudentID,
            MIN(EnrollmentDT) AS MinEnrollmentDT
        FROM
            StudentClass
        GROUP BY
            StudentID
    ) sq
        ON sc.StudentID = sq.StudentID
        AND sc.EnrollmentDT = sq.MinEnrollmentDT
GROUP BY
    sc.ClassID
ORDER BY
    IsFavoriteClassCount DESC;

Any guidance on their thinking would be greatly appreciated! If I made any errors in my constructions / query, take that as a proper error and not something intentional.

CodePudding user response:

SELECT
   ClassID,
   COUNT(*) AS IsFavoriteClassCount 
FROM
   (
      SELECT
         sc.ClassID,
         sc.StudentID,
         ROW_NUMBER() OVER (PARTITION BY sc.StudentID 
      ORDER BY
         sc.EnrollmentDT) AS rn 
      FROM
         StudentClass sc 
   )
   t 
WHERE
   rn = 1 
GROUP BY
   ClassID 
ORDER BY
   IsFavoriteClassCount DESC;

The query uses ROW_NUMBER() as a window function to assign a unique row number to each student's first enrollment in a class (based on their enrollment date). The inner query selects the ClassID and the StudentID for each enrollment and adds the row number, partitioned by the StudentID and ordered by the enrollment date. The outer query filters for the rows where the row number is 1, which indicates that these are the first enrollments for each student, and aggregates the data by the ClassID to get the count of how many times each class was the first choice of a student.

CodePudding user response:

You can use FIRST_VALUE() window function to get the first pick of each student:

WITH cte AS (
  SELECT DISTINCT StudentID,
         FIRST_VALUE(ClassID) OVER (PARTITION BY StudentID ORDER BY EnrollmentDT) AS ClassID
  FROM StudentClass
)
SELECT TOP 1 WITH TIES 
       ClassID,
       COUNT(*) AS IsFavoriteClassCount
FROM cte
GROUP BY ClassID
ORDER BY COUNT(*) DESC;

Remove TOP 1 WITH TIES to get results for all classes.

  • Related