I've read through the thread "C# HttpWebRequest command to get directory listing" and can get the following code to work:
using System;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace HTTPDirListing
{
public class MyDirListing
{
public static string GetDirectoryListingRegexForUrl(string url)
{
if (url.Equals("https://aeronav.faa.gov/d-tpp/"))
{
return "\\\"([^\"]*)\\\"";
}
throw new NotSupportedException();
}
public static void Main(String[] args)
{
string url = "https://aeronav.faa.gov/d-tpp/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string html = reader.ReadToEnd();
Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
MatchCollection matches = regex.Matches(html);
if (matches.Count > 0)
{
foreach (Match match in matches)
{
if (match.Success)
{
Console.WriteLine(match.ToString());
}
}
}
}
}
Console.ReadLine();
}
}
}
Below is the raw html after "reader.ReadToEnd()":
And my current Regex expression, which I admit I just copied from the original thread, returns the following:
So my question is, using RegEx, how can I return not only what I am returning now, but the date associated with each subfolder?
I need to build a URL based on the latest subfolder, which is stamped with a date. Unfortunately the list is not sorted by date. Based on the current directory listing, I would be building a URL link to point the user to "/d-tpp/2203/" based on the date 3/3/2022.
CodePudding user response:
Expand your regex to include the associated date text, and use named capture groups:
var regex = new Regex(@"(?<date>\d{1,2}/\d{1,2}/\d{4})[^""]*\""(?<path>[^""]*)\""");
...
foreach (var match in regex.Matches(html)) {
var date = DateTime.ParseExact(match.Groups["date"].Value, @"M\/d\/yyyy", null);
var path = match.Groups["path"].Value;
...
}