There are probably multiple ways to do this or frame this question
I have two tables I would like to join.
Table A
ID Column1A Column2A
56ade9.mobile.app. 1A_data 2A_data
ko9j77:web.source. 1A_data 2A_data
9eej:mobile.app. 1A_data 2A_data
pdfg6334df.web.source 1A_data 2A_data
gyu8ssl 1A_data 2A_data
Table B
ID Column1B Column2B
9eej 1B_data 2B_data
56ade9 1B_data 2B_data
gyu8ssl 1B_data 2B_data
pdfg6334df 1B_data 2B_data
ko9j77 1B_data 2B_data
And I'd like to join on the ID columns for a result of: (the row order doesn't matter)
ID Column1A Column2A Column1B Column2B
9eej
56ade9
pdfg633df
gyu8ssl
ko9j77
However the ID column in table A has additional, superfluous string characters which are delimitated by either a .
or a :
.
Some IDs in Table A will not have the superfluous strings added to the ID, such as the last row in Table A with ID gyu8ssl
The superfluous string sequence after the actual id in Table A is not consistent, and can be more than the examples I listed.
The actual id is not uniform in character length; however the first break in the alpha numeric sequence with a punctuation indicates the end of the actual ID which should be joined on.
Attempt:
I'm a bit stumped on how to approach this. I was looking at STRING_SPLIT
but it seems to only take in one delimiter at a time and I don't think that will get me what I want. I'm actually kind of stymied.
Any suggestions?
CodePudding user response:
select b.ID
, a.ColumnA1
, a.ColumnA1
, b.ColumnB1
, b.ColumnB2
from TableB b
inner join TableA a
on case
when replace(a.ID, ':', '.') like '%.%'
then substring(a.ID, charindex('.', replace(a.ID, ':', '.')))
else a.ID
end = b.ID
CodePudding user response:
The first step to solving this is writing a SELECT
query for Table A with an expression over the ID
column such that the expression results match the values in table B. But I'll gloss over that part; I expect other answers here will be more than adequate to learn how to write this expression, even if they show it in an ON
clause instead of the SELECT
list.
Instead, I'll spend my time suggesting a better way to use this expression. Once you are confident in the accuracy of the expression, alter the table to add it as a computed column. This will allow you to make the column PERSISTED
, which can improve performance by saving from doing this work at run time every time you run the query. You can also index it, which can improve performance yet again.