I can't seem to figure out why this bit of code is failing, it seems simple enough.
Code:
string[] ignore = File.ReadAllLines(@"logicfiles\[flag]-[ignore-these-links].txt");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(rawHtml);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
string linkUrl = link.GetAttributeValue("href", string.Empty);
if (!ignore.Any(linkUrl.Contains) && linkUrl.Length < 10 && !linkUrl.StartsWith("/"))
{
DataGridViewLinks.Rows.Add(linkUrl, keywordUsed, "", "", engineUsed);
}
}
The above code does not work as in it just adds every URL to the DataGrid
this part !ignore.Any(linkUrl.Contains)
is the part that is failing to work right, the ignore
array contains strings like facebook
, youtube
etc if the url linkUrl
does NOT contain one of these strings in it, then add it to the DataGrid
(is how it should work)
But if i do this:
string[] ignore = File.ReadAllLines(@"logicfiles\[flag]-[ignore-these-links].txt");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(rawHtml);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
string linkUrl = link.GetAttributeValue("href", string.Empty);
if (linkUrl.Length < 10 && !linkUrl.StartsWith("/"))
{
DataGridViewLinks.Rows.Add(linkUrl, keywordUsed, "", "", engineUsed);
}
}
And take that part of the code away, the other 2 conditions work perfectly, so I know the part of the logic not working is !ignore.Any(linkUrl.Contains)
I cannot see why, if someone could point out the issue it would be appreciated.
CodePudding user response:
Your Contains
logic is fine. There may be something wrong with the values that are being passed in from the text file. An upper / lower case issue or similar.
I recommend printing out or otherwise inspecting both the parsed url values, and the filter strings coming in from the text file and ensuring you're comparing what you think you are.
Here is just the logic for the Contains
with a set of values showing it working:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
var linkUrls = new List<string>{
"https://youtube.com/2134",
"https://google.com/2134",
"https://microsoft.com/2134"
};
var ignores = new List<string>{
"youtube",
"somethingElse"
};
foreach (var linkUrl in linkUrls)
{
if (!ignores.Any(linkUrl.Contains))
{
Console.WriteLine($"passed filter: {linkUrl}");
}
}
}
}
Output:
passed filter: https://google.com/2134
passed filter: https://microsoft.com/2134