I'm looking to create a yearly overview of tasks that employees have performed. Tasks are stored in a schedule table where each task has an identifier (e.g. 'ABC'), a starting date, and an end date. The columns of this table thus are: id, employee, task, startDate, and endDate.
Some data could include:
id | employee | task | startDate | endDate |
---|---|---|---|---|
1 | Mr. Anderson | ABC | 2023-01-05 | 2023-01-08 |
2 | Mr. Anderson | DEF | 2023-01-06 | 2023-01-07 |
3 | Ms. Beatrice | ABC | 2023-01-04 | 2023-01-06 |
4 | Mr. Anderson | ABC | 2023-01-10 | 2023-01-12 |
What I would like to do, is show a cross tab, where you have employees on the left, and the number of days these employees have performed every type of task in a given period of time (i.e. 2023).
So, the result would be:
Employee | ABC | DEF |
---|---|---|
Mr. Anderson | 7 | 2 |
Ms. Beatrice | 3 | 0 |
I know how to COUNT the occurrence of every task for every employee, which I'm doing with something like:
SELECT COUNT(id) FROM schedule
WHERE employee='Mr. Anderson' AND task='ABC'
This gives me 2 because there are 2 entries in the database where Mr. Anderson performs ABC. However, he performs ABC for 4 3 days (end date included in the count), so the result should be 7.
Does anyone know how I can update my query to give me the wanted result?
A solution with PIVOT would be acceptable as well (I haven't yet figured out how to work with PIVOT, but if there's a more elegant way to do this (without doing the query for every employee and every task) then please, be my guest.
EDIT: Sorry, I'm on MySQL.
CodePudding user response:
Instead of using COUNT()
us SUM()
and aggregate the number of days between the start and end dates.
While inside SQL, I do not recommend pivoting the data, that's best left outside of SQL. So, aim for three columns; the employee, the task, and the sum of elapsed days. (Both the SQL language and the DBMS implementations are designed with this normalised structure in mind.)
SELECT
employee,
task,
SUM(DATEDIFF(endDate, startDate) 1) total_days
FROM
schedule
GROUP BY
employee,
task
The 1
is needed because you've opted to use inclusive
end dates. (Normally they'd be exclusive
; the range between the 1st of Jan and the 1st of Jan should normally be 0 days)