Home > Software engineering >  Archiving an old PHP website: will any webhost let me totally disable query string support?
Archiving an old PHP website: will any webhost let me totally disable query string support?

Time:01-07

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.

I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.

I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.

And ideally I'd like to host it for free.

  • My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss xml) even though there's no clue about that in their filenames.
  • I then considered https://surge.sh/, but hit exactly the same problems.
  • I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
  1. Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
  2. Is there another host which will fit the bill?

If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to @, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.

CodePudding user response:

I found a solution with Netlify.

  1. I added the wget options --adjust-extension and --restrict-file-names=windows.

    The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.

    The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with @. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.

    This gives static files with names like myfile.php@param1=value1&param2=value2.html and myfile.php.html.

  2. I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.

  3. I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.

    /mypage.php param1=:param1 param2=:param2 /mypage.php@param1=:param1&param2=:param2.html 200!
    /mypage.php param1=:param1 /mypage.php@param1=:param1.html 200!
    /mypage.php param2=:param2 /mypage.php@param2=:param2.html 200!
    

    If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.

    There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.

  4. I wrote a _headers file to set the content type of my RSS file:

    /rss.php
      Content-Type: application/rss xml
    

I hope this helps somebody.

  • Related