I had an interview where they asked me: "Find the most popular class among students using their first enrollment date", where the assumption was that a student would pick their favorite class first. For simplicity, no two EnrollmentDT
could be exactly the same and there are no data issues (e.g. a student can't be enrolled in the same class twice).
They expected me to use a window function, and I'm curious how to do that for this problem.
I quickly setup some seed data as such (I'm aware the seed portion isn't a perfect representation, but I needed something close enough quickly):
IF OBJECT_ID('StudentClass') IS NOT NULL DROP TABLE StudentClass;
IF OBJECT_ID('Class') IS NOT NULL DROP TABLE Class;
IF OBJECT_ID('Student') IS NOT NULL DROP TABLE Student;
CREATE TABLE Student (
StudentID INT IDENTITY(1,1) PRIMARY KEY,
[Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
);
CREATE TABLE Class (
ClassID INT IDENTITY(1,1) PRIMARY KEY,
[Name] UNIQUEIDENTIFIER DEFAULT NEWID(),
ClassLevel INT DEFAULT CAST(CEILING(RAND() * 3) AS INT)
);
CREATE TABLE StudentClass (
StudentClassID INT IDENTITY(1,1),
StudentID INT FOREIGN KEY REFERENCES Student (StudentID),
ClassID INT FOREIGN KEY REFERENCES Class (ClassID),
EnrollmentDT DATETIME2
);
GO
INSERT INTO Student DEFAULT VALUES
GO 50
INSERT INTO Class DEFAULT VALUES
GO 5
DECLARE @StudentIndex INT = 1;
DECLARE @Cycle INT = 1;
WHILE @Cycle <= 5
BEGIN
IF RAND() > 0.5
BEGIN
INSERT INTO StudentClass (StudentID, ClassID, EnrollmentDT)
VALUES
(@StudentIndex, @Cycle, DATEADD(SECOND, CAST(CEILING(RAND() * 10000) AS INT), SYSDATETIME()))
END
SET @StudentIndex = @StudentIndex 1;
IF @StudentIndex = 50
BEGIN
SET @Cycle = @Cycle 1;
SET @StudentIndex = 1;
END
END
But the only thing I could come up with was:
SELECT
sc.ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM
StudentClass sc
INNER JOIN (
SELECT
StudentID,
MIN(EnrollmentDT) AS MinEnrollmentDT
FROM
StudentClass
GROUP BY
StudentID
) sq
ON sc.StudentID = sq.StudentID
AND sc.EnrollmentDT = sq.MinEnrollmentDT
GROUP BY
sc.ClassID
ORDER BY
IsFavoriteClassCount DESC;
Any guidance on their thinking would be greatly appreciated! If I made any errors in my constructions / query, take that as a proper error and not something intentional.
CodePudding user response:
SELECT
ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM
(
SELECT
sc.ClassID,
sc.StudentID,
ROW_NUMBER() OVER (PARTITION BY sc.StudentID
ORDER BY
sc.EnrollmentDT) AS rn
FROM
StudentClass sc
)
t
WHERE
rn = 1
GROUP BY
ClassID
ORDER BY
IsFavoriteClassCount DESC;
The query uses ROW_NUMBER() as a window function to assign a unique row number to each student's first enrollment in a class (based on their enrollment date). The inner query selects the ClassID and the StudentID for each enrollment and adds the row number, partitioned by the StudentID and ordered by the enrollment date. The outer query filters for the rows where the row number is 1, which indicates that these are the first enrollments for each student, and aggregates the data by the ClassID to get the count of how many times each class was the first choice of a student.
CodePudding user response:
You can use FIRST_VALUE()
window function to get the first pick of each student:
WITH cte AS (
SELECT DISTINCT StudentID,
FIRST_VALUE(ClassID) OVER (PARTITION BY StudentID ORDER BY EnrollmentDT) AS ClassID
FROM StudentClass
)
SELECT TOP 1 WITH TIES
ClassID,
COUNT(*) AS IsFavoriteClassCount
FROM cte
GROUP BY ClassID
ORDER BY COUNT(*) DESC;
Remove TOP 1 WITH TIES
to get results for all classes.