I have an XML instance that contains processing instructions. I want a specific one (the schematron declaration):
<?xml-model href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?>
There may or may not be more than these very processing instructions present, so I can't rely on its position in the DOM; it is guaranteed, on the other hand, that there will be only one (or none) such Schematron file reference. Thus, I get it like so:
XProcessingInstruction p = d.Nodes().OfType<XProcessingInstruction>()
.Where(x => x.Target.Equals("xml-model") &&
x.Data.Contains("schematypens=\"http://purl.oclc.org/dsdl/schematron\""))
.FirstOrDefault();
In the example given, the content of p.Data
is the string
href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"
I need to extract the path specified via @href (i. e. in this example I would want the string ../../a/b/c.sch
) without double quotes. In other words: I need the substring after href="
and before the next "
. I'm trying to achieve my goal with LINQ:
var a = p.Data.Split(' ').Where(s => s.StartsWith("href=\""))
.Select(s => s.Substring("href=\"".Length))
.Select(s => s.TakeWhile(c => c != '"'));
I would have thought this gave me a IEnumerable<char>
which I could then convert to a string in one of the ways described here, but that's not the case: According to LINQPad, I seem to be getting a IEnumerabale<IEnumerable<char>>
which I can't manage to make into a string.
How could this be done correctly using LINQ? Maybe I'd better be using Regex within LINQ?
Edit: After typing this down, I came up with a working solution, but it seems very inelegant:
string a = new string
(
p.Data.Substring(p.Data.IndexOf("href=\"") "href=\"".Length)
.TakeWhile(c => c != '"').ToArray()
);
What would be a better way?
CodePudding user response:
Try this:
var input = @"<?xml-model href=""../../a/b/c.sch"" schematypens=""http://purl.oclc.org/dsdl/schematron""?>";
var match = Regex.Match(input, @"href=""(.*?)""");
var url = match.Groups[1].Value;
That gives me ../../a/b/c.sch
in url
.
Please don't use Regex for general XML parsing, but for this situation it's fine.