Home > Software design >  Caputure correct part of string based on it's relationship
Caputure correct part of string based on it's relationship

Time:07-06

I am receiving the following response header from an API request (as String):

<https://example1>; rel="previous", <https://example2>; rel="next"

I need to capture the url that relates only to the "next" rel attribute. I know I can dig that url out of the string as is but I cannot guarantee they will be returned in the order prev, next.

What is a good way to ensure I capture the correct url that is associated with the next param?

I would prefer to know how to do this in vb but am happy to translate a C# answer.

CodePudding user response:

Looking at your example shows me that the "parts" of previous and next are separated by a comma, so we can split the string into two at the comma, find which one contains rel="next", then substring inside it to get the URL, something like

const string response = "<https://example1>; rel=\"previous\", <https://example2>; rel=\"next\"";
var split = response.Split(',');

var url = string.Empty;
foreach (var part in split)
{
    var trimmed = part.Trim();
    if (!trimmed.Contains("rel=\"next\""))
        continue;

    url = trimmed.Substring(0, trimmed.IndexOf(';'));
    url = url.TrimStart('<');
    url = url.TrimEnd('>');
}

Console.WriteLine(url);

Check out the demo

CodePudding user response:

The question's string looks like an HTML link element without the tags. Extracting information from HTML with simple string manipulations or even regular expressions is very fragile because the elements can vary a lot. The same element may be empty or have no content and the browser will treat it the same. There may be newlines or extra whitespace between attributes or attribute names and their values.

It seems the actual problem is how to find the URLs of the next and previous read the next and previous pages in paged results. This used to be done using link rel='next' and link rel='previous' before Google decided to stop using these as an indication that pages are related. Google's page that explains how rel is used for paging doesn't have these links.

It's a lot safer and faster to use an HTML parser like AngleSharp. The following code retrieves the Google page and searches for the HREF of a link pre='search' element. AngleSharp allows searching for elements using DOM/CSS selectors or LINQ. In this case the code is searching using a CSS selector : link[pre='search']

using AngleSharp;
using AngleSharp.Dom;
using AngleSharp.Html.Dom;

IBrowsingContext context = BrowsingContext.New(Configuration.Default.WithDefaultLoader());
var url="https://developers.google.com/search/blog/2011/09/pagination-with-relnext-and-relprev";
IDocument document = await context.OpenAsync(url);

var link = document.QuerySelector<IHtmlLinkElement>("link[rel='search']");
Console.WriteLine(link.Href);
  • Related