Home > Enterprise >  C# HttpWebRequest Directory Listing using Regex
C# HttpWebRequest Directory Listing using Regex

Time:03-08

I've read through the thread "C# HttpWebRequest command to get directory listing" and can get the following code to work:

using System;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

namespace HTTPDirListing
{
    public class MyDirListing
    {
        
        public static string GetDirectoryListingRegexForUrl(string url)
        {
            if (url.Equals("https://aeronav.faa.gov/d-tpp/"))
            {
                 return "\\\"([^\"]*)\\\"";
            }
            throw new NotSupportedException();
        }
        
        public static void Main(String[] args)
        {
            string url = "https://aeronav.faa.gov/d-tpp/";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    string html = reader.ReadToEnd();
                    Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                    MatchCollection matches = regex.Matches(html);
                    if (matches.Count > 0)
                    {
                        foreach (Match match in matches)
                        {
                            if (match.Success)
                            {
                                Console.WriteLine(match.ToString());
                            }
                        }
                    }
                }
            }

            Console.ReadLine();
        }
    }
}

Below is the raw html after "reader.ReadToEnd()":

Raw HTML

And my current Regex expression, which I admit I just copied from the original thread, returns the following:

enter image description here

So my question is, using RegEx, how can I return not only what I am returning now, but the date associated with each subfolder?

I need to build a URL based on the latest subfolder, which is stamped with a date. Unfortunately the list is not sorted by date. Based on the current directory listing, I would be building a URL link to point the user to "/d-tpp/2203/" based on the date 3/3/2022.

CodePudding user response:

Expand your regex to include the associated date text, and use named capture groups:

var regex = new Regex(@"(?<date>\d{1,2}/\d{1,2}/\d{4})[^""]*\""(?<path>[^""]*)\""");
...
foreach (var match in regex.Matches(html)) {
    var date = DateTime.ParseExact(match.Groups["date"].Value, @"M\/d\/yyyy", null);
    var path = match.Groups["path"].Value;
    ...
}
  • Related