Home > Software design >  Is there a way to find each URL in a Varchar field's records, pull out specific chars, and repl
Is there a way to find each URL in a Varchar field's records, pull out specific chars, and repl

Time:07-21

I have a data table called Characteristics with 561 records, that has a column called "FullDescription," each row of which is full of paragraphs featuring one or more URLs that were entered incorrectly. In each row for this column, I need to capture a very specific string, a 6-digit number that exists in each URL, and then replace each entire URL with a very specific URL that ends with that 6 digit numeric value. Each URL to be replaced has 42 characters, and the 6-digit code occupies locations 32 through 37 relative to the URL. Unfortunately, each result contains several URLs, and because it's in paragraph form, there's no way to tell the numeric position of each URL relative to the result it's in.

As an example, one result might have 3 URLs in it. This is not the specific data, but each row will have different words in it. Here is an example of what this query might return:

SELECT FullDesciption
FROM Characteristics
WHERE ID = 400

Here are some words and http://replace.this.org/number/123456.html and then they say some more words and you get another URL http://replace.this.org/number/234567.html and then it keeps going and more URLs appear http://replace.this.org/number/999999.html the end.

I need to find a way to capture each 6-digit number right before the .html for every URL in a result, replace each instance of a URL with a 34 character URL where the first 28 characters are all the same, and the 6-digit number will occupy the last 6 locations, 29 through 34. When all is said and done, after the change, the paragraph above should look like this:

Here are some words and https://www.this.org/number/123456 and then they say some more words and you get another URL https://www.this.org/number/234567 and then it keeps going and more URLs appear https://www.this.org/number/999999 the end.

I've tried using wildcards that don't seem to work in the search function, but everything I've tried isn't working, and perhaps it required functions I haven't used yet.

Thank you!

CodePudding user response:

Ignoring the fact that doing this somewhere other than SQL Server using some regex-replace might be preferable, looking at the challenge of doing it in SQL Server the obvious approach is to break the source data into rows with one URL per row, perform the string replacements, and then re-aggregate.

Making use of OpenJson to split the string (it provides for dependable ordering) and String_Agg to reaggregate yields the following (assuming a fully supported version of SQL Server):

select String_Agg(NewDescription, '') within group (order by seq)
from t
cross apply (
  select IsNull(
    Replace(
      Replace(
        j.[value], Left(j.[value], NullIf(CharIndex('.', j.[value]),0) -1), 'https://www'
      ), '.html',''
    ), j.[value]
  ) NewDescription,
  Convert(int, j.[Key]) Seq
  from OpenJson(Concat('["',replace(String_Escape(FullDescription, 'json'), 'http://', '","'), '"]')) j
)u;

This will almost certainly require some work to deal with edge cases and other content in your source data, but works with what you have provided.

see enter image description here

  • Related