Home > Blockchain >  Extract substring between startsequence and endsequence in C# using LINQ
Extract substring between startsequence and endsequence in C# using LINQ

Time:04-16

I have an XML instance that contains processing instructions. I want a specific one (the schematron declaration):

<?xml-model href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?>

There may or may not be more than these very processing instructions present, so I can't rely on its position in the DOM; it is guaranteed, on the other hand, that there will be only one (or none) such Schematron file reference. Thus, I get it like so:

XProcessingInstruction p = d.Nodes().OfType<XProcessingInstruction>()
   .Where(x => x.Target.Equals("xml-model") && 
    x.Data.Contains("schematypens=\"http://purl.oclc.org/dsdl/schematron\""))
   .FirstOrDefault();

In the example given, the content of p.Data is the string

href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"

I need to extract the path specified via @href (i. e. in this example I would want the string ../../a/b/c.sch) without double quotes. In other words: I need the substring after href=" and before the next ". I'm trying to achieve my goal with LINQ:

var a = p.Data.Split(' ').Where(s => s.StartsWith("href=\""))
       .Select(s => s.Substring("href=\"".Length))
       .Select(s => s.TakeWhile(c => c != '"'));

I would have thought this gave me a IEnumerable<char> which I could then convert to a string in one of the ways described here, but that's not the case: According to LINQPad, I seem to be getting a IEnumerabale<IEnumerable<char>> which I can't manage to make into a string.

How could this be done correctly using LINQ? Maybe I'd better be using Regex within LINQ?


Edit: After typing this down, I came up with a working solution, but it seems very inelegant:

string a = new string
   (
      p.Data.Substring(p.Data.IndexOf("href=\"")   "href=\"".Length)
      .TakeWhile(c => c != '"').ToArray()
   );

What would be a better way?

CodePudding user response:

Try this:

var input = @"<?xml-model href=""../../a/b/c.sch"" schematypens=""http://purl.oclc.org/dsdl/schematron""?>";
var match = Regex.Match(input, @"href=""(.*?)""");
var url = match.Groups[1].Value;

That gives me ../../a/b/c.sch in url.

Please don't use Regex for general XML parsing, but for this situation it's fine.

  • Related