Home > Back-end >  Joining columns on partial strings
Joining columns on partial strings

Time:11-30

There are probably multiple ways to do this or frame this question

I have two tables I would like to join.

Table A

ID                          Column1A              Column2A
56ade9.mobile.app.          1A_data               2A_data
ko9j77:web.source.          1A_data               2A_data
9eej:mobile.app.            1A_data               2A_data
pdfg6334df.web.source       1A_data               2A_data
gyu8ssl                     1A_data               2A_data

Table B

ID                          Column1B              Column2B
9eej                        1B_data               2B_data
56ade9                      1B_data               2B_data
gyu8ssl                     1B_data               2B_data
pdfg6334df                  1B_data               2B_data
ko9j77                      1B_data               2B_data

And I'd like to join on the ID columns for a result of: (the row order doesn't matter)

ID        Column1A Column2A Column1B Column2B
9eej
56ade9
pdfg633df
gyu8ssl
ko9j77

However the ID column in table A has additional, superfluous string characters which are delimitated by either a . or a :.

Some IDs in Table A will not have the superfluous strings added to the ID, such as the last row in Table A with ID gyu8ssl

The superfluous string sequence after the actual id in Table A is not consistent, and can be more than the examples I listed.

The actual id is not uniform in character length; however the first break in the alpha numeric sequence with a punctuation indicates the end of the actual ID which should be joined on.

Attempt:

I'm a bit stumped on how to approach this. I was looking at STRING_SPLIT but it seems to only take in one delimiter at a time and I don't think that will get me what I want. I'm actually kind of stymied.

Any suggestions?

CodePudding user response:

select b.ID
, a.ColumnA1
, a.ColumnA1
, b.ColumnB1
, b.ColumnB2
from TableB b
  inner join TableA a
    on case 
         when replace(a.ID, ':', '.') like '%.%'
           then substring(a.ID, charindex('.', replace(a.ID, ':', '.')))
         else a.ID
       end = b.ID

CodePudding user response:

The first step to solving this is writing a SELECT query for Table A with an expression over the ID column such that the expression results match the values in table B. But I'll gloss over that part; I expect other answers here will be more than adequate to learn how to write this expression, even if they show it in an ON clause instead of the SELECT list.

Instead, I'll spend my time suggesting a better way to use this expression. Once you are confident in the accuracy of the expression, alter the table to add it as a computed column. This will allow you to make the column PERSISTED, which can improve performance by saving from doing this work at run time every time you run the query. You can also index it, which can improve performance yet again.

  • Related