Home > Back-end >  Migrating App Engine use Google Cloud Load Balancer cause ~1hr of downtime
Migrating App Engine use Google Cloud Load Balancer cause ~1hr of downtime

Time:09-17

Overview

I have instances on App Engine with a custom domain and SSL certs provisioned by Google, but now I need to put a Google Cloud Load Balancer in front of it.

I followed the instructions here (with adjustments to do it for App Engine instead of Cloud Run): https://cloud.google.com/load-balancing/docs/https/setting-up-https-serverless

I performed the steps in that guide first and then updating my DNS records in GoDaddy to point to the IP of the Load Balancer after.

The problem

The problem is that it took almost an hour to become reachable again, after I updated my GoDaddy DNS records to point to the Load Balancer's IP. When trying to access the site via browser or code, i was getting SSL errors.

Provisioning SSL Certs

The core issue seems to be that the SSL Cert for the Load Balancer was stuck with a status of PROVISIONING and the domain was stuck with a status of FAILED_NOT_VISIBLE, for which the docs say:

The domain's DNS record doesn't resolve to the IP address of the Google Cloud load balancer. To resolve this issue, update the DNS A/AAAA records to point to your load balancer's IP address.

https://cloud.google.com/load-balancing/docs/ssl-certificates/troubleshooting#domain-status

And these docs say this about PROVISIONING:

Google Cloud is working with the Certificate Authority to issue the certificate. Provisioning a Google-managed certificate might take up to 60 minutes

https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#verify-target-proxy

Is there anything I can do to avoid/minimize this hour of downtime?

I still need to do this to my production project. Maybe if I switch up the order of the steps (point the DNS records to the IP before even creating the SSL certs)?

It seems like it'd be fine if I could get the SSL certs to provision before I update the DNS record to point the Load Balancer's IP, but updating the DNS seems to be a prerequisite for the SSL Cert to even start.

It's funny, because I already have SSL certs for these domains from google via the App Engine Custom Domain settings. I wish those could just get reused for the load balancer instead.

https://cloud.google.com/appengine/docs/standard/python/securing-custom-domains-with-ssl#verify_a_managed_certificate_has_been_provisioned

CodePudding user response:

Did you create a new DNS resource record or change an existing one?

If you tried to resolve a resource record before you created it, the DNS server will return NXDOMAIN, which is called a Negative Response. Negative responses are cached by DNS Resolvers.

If you changed an existing resource record, what was the TTL?

DNS Resolvers use various strategies to decide how long to cache DNS resource records. One factor is the TTL.

Create/update the DNS resource record first

By creating the DNS resource record first, NXDOMAIN will not be returned on the validation attempt which will reduce how long you must wait for the negative response cache to clear. Your domain's authoritative DNS server is typically two to four servers. When a new resource record is created, it takes time for the servers to create and synchronize the SLAVES with the MASTER. This time is typically only a minute or two.

Flush the public Google DNS Servers

If you have stale (changed) DNS resource records with long TTL values, flush the Google public DNS servers. This operation is not instant, plan to wait five minutes for the operation to complete.

Google: Flush Cache

Cloudflare: Flush Cache

Is there anything I can do to avoid/minimize this hour of downtime?

You cannot directly change the provisioning time. If you follow the above points, the provisioning time will be reduced. In my experience, ten minutes is typical for SSL certificate provisioning.

Google Load Balancers require time to update after making changes. This time varies, but five to ten minutes is typical. This time is in addition to certificate provisioning. Your site may be unavailable during this time.

DNS server changes are not instantaneous. Your domain's DNS servers take time to update, DNS Resolvers on the Internet cache resource records, client systems cache records, etc. Create plans before making changes to DNS servers. Changes can take time as in 24 to 72 hours to propagate globally.

  • Related