Google App Engine throws error on Basic Scaling-CodePudding

I'm using golang & Google App Engine for the project. I had a task where I received a huge file, split it into lines and sent these lines one by one to a queue to be resolved. My initial setting for scaling inside the app.yaml file was the following:

instance_class: F1
automatic_scaling:
  min_instances: 0
  max_instances: 4
  min_idle_instances: 0    
  max_idle_instances: 1
  target_cpu_utilization: 0.8
  min_pending_latency: 15s

It was working alright, but it had an issue - since there were really a lot of tasks, 10 minutes later it would fail (as per documentation, of course). So I decided to use B1 instance class instead of F1 - and this is where the things went wrong.

My setup for B1 looks like this:

instance_class: B1
basic_scaling:
  max_instances: 4

Now, I've created a very simple demo to demonstrate the idea:

r.GET("foo", func(c *gin.Context) {
        _, err := tm.CreateTask(&tasks.TaskOptions{
            QueueID:  "bar",
            Method:   "method",
            PostBody: "foooo",
        })
        if err != nil {
            lg.LogErrorAndChill("failed, %v", err)
        }
    })

    r.POST("bar/method", func(c *gin.Context) {
        data, err := c.GetRawData()
        if err != nil {
            lg.LogErrorAndPanic("failed", err)
        }
        fmt.Printf("data is %v \n", string(data))
    })

To explain the logic behind it: I send a request to "foo" which creates a task which is added to the queue with some body text. Inside the task a post method is being called based on the queueId and method parameters, which receives some text and in this simple example just logs it out.

Now, when I run the request, I get the 500 error, which looks like this:

[GIN] 2021/10/05 - 19:38:29 | 500 |     301.289µs |         0.1.0.3 | GET      "/_ah/start"

And in the logs I can see:

Process terminated because it failed to respond to the start request with an HTTP status code of 200-299 or 404.

And inside the queue in the task (reason to retry):

INTERNAL(13): Instance Unavailable. HTTP status code 500

Now, I've read the documentation and I know about the following:

Manual, basic, and automatically scaling instances startup differently. When you start a manual scaling instance, App Engine immediately sends a /_ah/start request to each instance. When you start an instance of a basic scaling service, App Engine allows it to accept traffic, but the /_ah/start request is not sent to an instance until it receives its first user request. Multiple basic scaling instances are only started as necessary, in order to handle increased traffic. Automatically scaling instances do not receive any /_ah/start request.

When an instance responds to the /_ah/start request with an HTTP status code of 200–299 or 404, it is considered to have successfully started and can handle additional requests. Otherwise, App Engine terminates the instance. Manual scaling instances are restarted immediately, while basic scaling instances are restarted only when needed for serving traffic

But it is not really helpful - I don't understand why the /_ah/start request does not respond properly and I am not really sure how to debug it or how to fix it, especially since the F1 instance was working ok.

CodePudding user response：

Request to the url /_ah/start/ are routed to your app, and your app apparently is not ready to handle it, which leads to the 500 response. Check your logs.

Basically your app needs to be ready to incoming requests with url /_ah/start/ (similar way as it is ready to handle requests to url /foo/). If you run the app locally, try to open such url (via curl etc) and see what will be the response. It needs to respond with a response code 200–299 or 404 (as mentioned in the text you quoted), otherwise it wont be considered as a successfully started instance.