I have two tables:
CREATE TABLE Employee (
Site ???? ????,
WorkTypeId char(2) NOT NULL,
Emp_NO int NOT NULL,
"Date" ???? NOT NULL
);
CREATE TABLE PTO (
Site ???? ????,
WorkTypeId char(2) NULL,
Emp_NO int NOT NULL,
"Date" ???? NOT NULL
);
I would like to update values in PTO
's WorkTypeId
column:
EMP NO
inEmployee
(the lookup table) andPTO
should match.- A single
WorkTypeId
value should be picked from only the first occurrence of the month.
For example, given this sample input data:
TABLE Employee
:
Site | WorkTypeId | Emp_NO | Date |
---|---|---|---|
5015 | MB | 1005 | 2022-02-01 |
5015 | MI | 1005 | 2022-02-04 |
5015 | PO | 1005 | 2022-02-04 |
5015 | ME | 2003 | 2022-01-01 |
5015 | TT | 2003 | 2022-01-10 |
TABLE PTO
:
Site | WorkTypeId | Emp_NO | Date |
---|---|---|---|
5015 | 1005 | 2022-02-03 | |
5015 | 1005 | 2022-02-14 | |
5014 | 2003 | 2022-01-09 |
For example:
- Given
Employee
withEmp_NO = 1005
...- ...there are 3 rows for that
Emp_NO
in theEmployee
table, with 3 distinctWorkTypeId
values, but differingDate
values. - So pick the
WorkTypeId
value for the earliestDate
(2022-02-01), which is'MB'
- So
Emp_NO
getsWorkTypeId = 'MB'
. - And use that single value to fill
1005
'sWorkTypeId
cells in thePTO
table. - But also match by month.
- ...there are 3 rows for that
So the expected output in the PTO
table is
Site | WorkTypeId | Emp_NO | Date |
---|---|---|---|
5015 | MB | 1005 | 2022-02-03 |
5015 | MB | 1005 | 2022-02-14 |
5014 | ME | 2003 | 2022-01-09 |
CodePudding user response:
Getting a value from a column different to the column used in a MIN
/MAX
expression in a GROUP BY
query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they're completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.
Anyway, as-ever in modern SQL, there's usually 3 or 4 different ways to accomplish the same task, with a few gotchas.
Preface:
As
Site
,Date
,Year
, andMonth
are all keywords in T-SQL, I've escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.- SQL Server supports this by default. If (for some ungodly reason) you have
SET QUOTED IDENTIFIER OFF
then change the double-quotes to square-brackets:[]
- SQL Server supports this by default. If (for some ungodly reason) you have
I assume that the
Site
column in both tables is just a plain' ol' data column, as such:- It is not a
PRIMARY KEY
member column. - It should not be used as a
GROUP BY
. - It should not be used in a
JOIN
predicate.
- It is not a
All of the approaches below assume this database state:
CREATE TABLE "Employee" (
"Site" int NOT NULL,
WorkTypeId char(2) NOT NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);
CREATE TABLE "PTO" (
"Site" int NOT NULL,
WorkTypeId char(2) NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);
GO
INSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, 'MB', 1005, '2022-02-01' ),
( 5015, 'MI', 1005, '2022-02-04' ),
( 5015, 'PO', 1005, '2022-02-04' ),
( 5015, 'ME', 2003, '2022-01-01' ),
( 5015, 'TT', 2003, '2022-01-10' );
INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, NULL, 1005, '2022-02-03' ),
( 5015, NULL, 1005, '2022-02-14' ),
( 5014, NULL, 2003, '2022-01-09' );
- Both approaches define CTEs
e
andp
that extendEmployee
andPTO
respectively to add computed"Year"
and"Month"
columns, which avoids having to repeatedly useYEAR( "Date" ) AS "Year"
inGROUP BY
andJOIN
expressions.- I suggest you add those as computed-columns in your base tables, if you're able, as they'll be useful generally anyway. Don't forget to index them appropriately too.
Approach 1: Composed CTEs with elementary aggregates, then UPDATE
:
WITH
-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):
e AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",
YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
Employee
),
p AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",
YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
PTO
),
-- Step 2: Get the MIN( "Date" ) value for each group:
minDatesForEachEmployeeMonthYearGroup AS (
SELECT
e.Emp_No,
e."Year",
e."Month",
MIN( "Date" ) AS "FirstDate"
FROM
e
GROUP BY
e.Emp_No,
e."Year",
e."Month"
),
-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
/* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */
SELECT
e.Emp_No,
e."Year",
e."Month",
e.WorkTypeId AS FirstWorkTypeId
FROM
e
INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON
e.Emp_NO = q.Emp_NO
AND
e."Date" = q.FirstDate
)
-- Step 4: Do the UPDATE.
-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
p
INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
p.Emp_No = f.Emp_No
AND
p."Year" = f."Year"
AND
p."Month" = f."Month"
WHERE
p.WorkTypeId IS NULL;
Here's a screenshot of SSMS showing the contents of the PTO
table from before, and after, the above query runs:
Approach 2: Skip the self-JOIN
with FIRST_VALUE
:
This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).
Approach 2b's plan looks like this:
@SOS's plan, for comparison, is a lot simpler... and I honestly don't know why, but it does show how good SQL Server's query optimizer is thesedays:
CodePudding user response:
Update 2002-03-05
Leaving this here for posterity, but I'd recommend reading Dai's excellent write up on different approaches to this problem.
Try a CROSS APPLY to grab the first Employee record with a matching month and year.
Note: Use OUTER APPLY to always return all PTO records, even when no matching WorkTypeId was found.
SELECT p.Site
, e.WorkTypeId
, p.Emp_No
, p.[Date]
FROM PTO p CROSS APPLY
(
SELECT TOP 1 WorkTypeId
FROM Employee e
WHERE e.Emp_No = p.Emp_No
AND MONTH(e.[Date]) = MONTH(p.[Date])
AND YEAR(e.[Date]) = YEAR(p.[Date])
ORDER BY [Date] ASC
)e
Results:
Site | WorkTypeId | Emp_No | Date ---: | :--------- | -----: | :--------- 5015 | MB | 1005 | 2022-02-03 5015 | MB | 1005 | 2022-02-14 5014 | ME | 2003 | 2022-01-09
db<>fiddle here