I have two date columns in two different tables which I need to join. However, they appear in different formats, making the join difficult. How can I update the column values in Table_1 to make the join with Table_2?
Table_1.DATE
0013-01-01,
0013-01-02,
0013-01-03, etc..
Table_2.DATE
2013-01-01,
2013-01-02,
2013-01-03, etc...
CodePudding user response:
The base answer if they are "dates" is to add 2000 years, and see if they are equal:
with table_1(date) as (
select * from values
('0013-01-01'::date),
('0013-01-02'::date),
('0013-01-03'::date)
), table_2(date) as (
select * from values
('2013-01-01'::date),
('2013-01-02'::date),
('2013-01-03'::date)
)
select
t1.*
,t2.*
from table_1 as t1
join table_2 as t2
on dateadd(year, 2000, t1.date) = t2.date
;
DATE | DATE_2 |
---|---|
0013-01-01 | 2013-01-01 |
0013-01-02 | 2013-01-02 |
0013-01-03 | 2013-01-03 |
There are some gotcha's where if you are doing massive work the dateadd should be pushed earlier into the CTE/pipeline to avoid duplicate work. Also if you have normal date in the tables, I would do the join twice (four times actually) and join, or again manipulate the data in a CTE.
like:
with table_1(date) as (
select * from values
('0013-01-01'::date),
('0013-01-02'::date),
('0013-01-03'::date),
('2013-01-04'::date),
('2013-01-05'::date),
('0013-01-06'::date)
), table_2(date) as (
select * from values
('2013-01-01'::date),
('2013-01-02'::date),
('2013-01-03'::date),
('0013-01-04'::date),
('2013-01-05'::date),
('0013-01-06'::date)
)
select
t1.*
,t2.*
from table_1 as t1
join table_2 as t2
on dateadd(year, 2000, t1.date) = t2.date
union all
select
t1.*
,t2.*
from table_1 as t1
join table_2 as t2
on dateadd(year, 2000, t2.date) = t1.date
union all
select
t1.*
,t2.*
from table_1 as t1
join table_2 as t2
on t2.date = t1.date
;
DATE | DATE_2 |
---|---|
0013-01-01 | 2013-01-01 |
0013-01-02 | 2013-01-02 |
0013-01-03 | 2013-01-03 |
2013-01-04 | 0013-01-04 |
2013-01-05 | 2013-01-05 |
0013-01-06 | 0013-01-06 |
can be faster than:
select
t1.*
,t2.*
from table_1 as t1
join table_2 as t2
on (dateadd(year, 2000, t1.date) = t2.date)
OR dateadd(year, 2000, t2.date) = t1.date
OR t2.date = t1.date
as the compiler knows less about your data, so can write (in complex cases) the logic poorly.
So, if you want to pre_alter, you need to avoid the duplicates of the both to high which for the Year 2000 -> 4000 can be avoided with a filter in the CTE, but dual year 0000 will double match, so to handle this your need to track the total "changes applied" and exclude the both case like:
with table_1(date) as (
select * from values
('0013-01-01'::date),
('0013-01-02'::date),
('0013-01-03'::date),
('2013-01-04'::date),
('2013-01-05'::date),
('0013-01-06'::date)
), table_2(date) as (
select * from values
('2013-01-01'::date),
('2013-01-02'::date),
('2013-01-03'::date),
('0013-01-04'::date),
('2013-01-05'::date),
('0013-01-06'::date)
), fix_t1 as (
select date, dateadd(year, y.y, t1.date) as new_date, y.y
from table_1 as t1
cross join (values (0),(2000)) as y(y)
--where new_date < '3000-01-01'::date
), fix_t2 as (
select date, dateadd(year, y.y, t2.date) as new_date, y.y
from table_2 as t2
cross join (values (0),(2000)) as y(y)
--where new_date < '3000-01-01'::date
)
select
t1.*
,t2.*
from fix_t1 as t1
join fix_t2 as t2
on t1.new_date = t2.new_date and t1.y t2.y < 4000
;
now these later answers all look way more complex than the three branched OR case, and they are. But really if you have massive data, tell the DB exactly what you want it todo, can pay good dividends, at the expense of more complex SQL to read. As always, test the code, and pick the cost/perf trade-off that meets your needs.