I built a scraper that collects data from a page, formats it and adds it to a database. It then uses the scraped data to build models, except for one value that it scrapes. Everything is wrapped in Celery so that tasks run in the background.
@router.post("/run/{id}")
async def create(id: str):
wallet_reputation.delay(id)
return {"Status": "Task successfully add to execute"}
Endpoint above works fine, everything is ok. The ID value that is added in the above endpoint is unique and there are about 100 such values. In order to automate building a model for each ID I made such an endpoint to call it from time to time (secrap data changes, hence I need to update my models).
@router.post("/run")
async def create_all():
for address in all_addresses_generator():
wallet_reputation.delay(address)
return {"Status": "Tasks successfully add to execute"}
I recive that error
2022-03-26T15:25:52.051854 00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256 00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701 00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723 00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724 00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724 00:00 app[worker.1]: R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724 00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725 00:00 app[worker.1]: return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725 00:00 app[worker.1]: File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725 00:00 app[worker.1]: WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727 00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727 00:00 app[worker.1]: nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727 00:00 app[worker.1]: File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727 00:00 app[worker.1]: WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727 00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728 00:00 app[worker.1]: raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728 00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message:
2022-03-26T15:26:02.875729 00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729 00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729 00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730 00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730 00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730 00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730 00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730 00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731 00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731 00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731 00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731 00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732 00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732 00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732 00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732 00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733 00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733 00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733 00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733 00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>
I don't understand why I suddenly get an error if the previous endpoint that performs the same task in Celery works normally. Below, I paste the code of the generator and class method, on which the error pops up.
def all_addresses_generator():
for row in session.query(DbNcTransaction).all():
yield row.to
def nc_balance(self):
base_url = "https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d?a="
self.driver.get(base_url self.address)
nc_balance = (
WebDriverWait(self.driver, 20)
.until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "#ContentPlaceHolder1_divFilteredHolderBalance")
)
)
.text
)
nc_balance = nc_balance.split()[1]
nc_balance = round(float(nc_balance.replace(",", "")), 2)
return nc_balance
How can I deal with this?
CodePudding user response:
The issue is not (initially) with Selenium raising TimeoutException
, but with Heroku raising R14 - Memory quota exceeded
error, as shown at the second line of the error log you provided. The RAM usage of your application has exceeded the available quota. Since you are using a free dyno, the maximum RAM (quota) is 512 MB (see here). However, your application - as shown at the first line of the error log (i.e., Process running mem=543M(104.1%)
) - requires more than that amount.
Thus, you may try either reducing the number of workers (in case you are using more than one), or reducing the RAM usage of your app, or upgrading to a different Heroku Dyno (see How do I upgrade from Heroku's free tier).
Update
Additionally, it would be preferable to instantiate the WebDriverWait
once (at startup), not multiple times (you may also need to increase the timeout
value in WebDriverWait
):
wait = WebDriverWait(driver, 10)
and then use as:
nc_balance = wait.until(....
CodePudding user response:
This error message...
2022-03-26T15:25:52.051854 00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256 00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701 00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException
...implies that TimeoutException was raised as there was an error initializing ForkPoolWorker-8
as your program exceeded the Memory quota.
Deep Dive
This is a classic example of Out of Memory error where the memory usage have exceeded the maximum level.
Process running mem=543M(104.1%)
Now during the usage of 543M the memory usage is 104.1% and presumably as per the Dyno memory specs you must be using:
free, hobby and standard-1x have 512 MB
Dynos
The Heroku Platform uses the container model to run and scale all the Heroku apps and the containers are called dynos. Dynos are isolated, virtualized linux containers that are designed to execute code based on a user-specified command. Apps can scale to any specified number of dynos based on its resource demands.
Error R14 (Memory quota exceeded)
At times a dyno may require memory in excess of its assigned quota. In those exceptional cases the dyno will page to swap space to continue running which may at times cause degraded process performance. This phenomenon can start generating the R14 error which is calculated by total memory swap, rss and cache as follows:
2011-05-03T17:40:10 00:00 app[worker.1]: Working
2011-05-03T17:40:10 00:00 heroku[worker.1]: Process running mem=1028MB(103.3%)
2011-05-03T17:40:11 00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2011-05-03T17:41:52 00:00 app[worker.1]: Working
Resolving R14 memory error
In these scenarios you may like your application to use less memory and you may need to tweak one of the below mentioned factors:
- number of threads
- largest possible request
- the distribution of incoming requests
- decrease thread count to reduce your memory needs (but this may lower your throughput)
- add capacity via scaling out e.g. adding additional dynos/servers
Generally adding capacity works perfecto as more servers/dynos comes into operation spreading out the requests and the event that all threads on an individual machine are processing the largest request at the same time is reduced. However in the long run the optimum path to reducing your overall memory requirement is reducing object allocation.
This usecase
In this usecase it seems as per the first code block i.e. def create(id: str)
for about 100 ID values to automate building a model for each ID your application is able to scale up but subsequently when you def create_all()
you start seeing the error.
Solution
You can adopt a different approach other than creating all the models for each ID in go. If possible divide the ID values to run in batch with each batch containing optimum number of model so the memory usage doesn't crossover the threshhold.