Django app on Google App Engine Standard slows down after 15 requests-CodePudding

I deployed a Django app on Google App Engine Standard with a F4 machine. The API is doing some machine learning and the processing has a duration of around 4s in local (between 3.5 and 4s). For my use case, it would be ok to have also 4-5s delays with the deployed app. However, when I test the deployed app doing the same request multiple times, I saw that the first requests take 3-4 seconds, but after 10-15 iterations, they take around 8s.

Here is the code I used to test my app:

session = requests.Session()

all_times = []
for i in range(50):
    try:
        t0 = time.time()
        resp = session.post(url_api, headers=headers, json=data)
        t1 = time.time()
        print(t1 - t0)
        all_times.append(t1 - t0)
    except Exception as e:
        print("Err", e)

The results of the request durations are as follows:

I wonder why there is this gap after 15 requests and why some points are way above the average 7-8s (e.g. 10s).

I get the same pattern looking at the latency in the Google Cloud Console:

What I tried

I tried to change the autoscaling parameters, thinking that it could be because of the creation of instances. But I got the same pattern when I restrict the number of instance to 1 with this in app.yaml:

automatic_scaling:
  max_instances: 1

I also tried to:

Use session in requests to avoid creating a new session at each iteration.
Do my requests using curl, which leads to the same results.

My goal is to minimize the variability and keep the request durations below 5s.

Update

Here is an example of the trace list from Cloud Trace, which shows the same pattern:

I compared traces associated with a large latency and traces with a lower latency, but didn't find any important difference.

Update 2

Thanks to the answer of Priyashree, setting the min_idle_instances to 1 to avoid machine restart solved the problem:

CodePudding user response：

When comparing Google Cloud Platform performance with the local one you should keep in mind that deploying on GCP needs more time to import all the necessary libraries and set up the Django framework.

In general it doesn't make much sense to compare the performance on your local machine with the performance on GCE, as local machines are likely running a different OS. But yes, I agree the difference in latency between the requests served by GAE after 10-15 requests is not acceptable and quite strange.

Check the following :

If the instance has exceeded the maximum memory for its configured instance_class which is F4 in your case which may cause shut down of your instance and App Engine creating a new one in its place as this could be due to not having any available instances since that's about the time it takes for an instance to be deployed and you have set max_instances to 1.
If you have set any target_cpu_utilization. It specifies the CPU usage threshold at which new instances will be started to handle traffic when old instances reach the target CPU usage.
The interval at which you are sending requests matters as you mentioned in your comments, “you are making the requests one by one”. When request volumes decrease, App Engine reduces the number of instances. When an application is not being used at all, App Engine turns off its associated dynamic instances, but readily reloads them as soon as they are needed. Reloading instances can result in additional latency for users. To make sure, you are not reloading instances, specify a minimum number of idle instances. Setting an appropriate number of idle instances for your application based on request volume allows your application to serve every request with little latency, unless you are experiencing abnormally high request volume.

Also, as per documentation if instance_class is set to F2 or higher, you can optimize your instances by setting max_concurrent_requests to a value higher than the default value of 10. To determine the optimal value, gradually increase it and monitor the performance of your application.

Go through PageSpeed Insights, which analyzes the content of a web page, then generates suggestions to make that page faster and could be handy. I would also suggest you to contact Google Support for a 1:1 interaction as this case/issue is environment specific.