I just upgraded from Python 3.7 to 3.9.14 and it now gives a variable not defined error. The same code works fine locally and remotely where Python 3.9.2 is installed but now locally it gives an error in Python 3.9.14 version. Below is the code:
def check(url):
result = None
product = Product(url, user_agents)
if product.is_connected():
result = product.parse()
return result
if __name__ == '__main__':
user_agents = []
with open('user-agents.txt', encoding='utf8') as f:
user_agents = f.readlines()
if len(links) > 0:
print('Starting with the Pool count = ', PRODUCT_POOL_COUNT)
with Pool(PRODUCT_POOL_COUNT) as p:
result = p.map(check, links)
result = list(filter(None, result)) # Remove Empty
Below is the error message:
Traceback (most recent call last):
File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/Users/Me/Data/Clients/App/Etsy/products/parse_product.py", line 12, in check
product = Product(url, user_agents)
NameError: name 'user_agents' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Me/Data/Clients/App/Etsy/products/parse_product.py", line 126, in <module>
result = p.map(check, links)
File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
NameError: name 'user_agents' is not defined
CodePudding user response:
You must be on OS-X. On that OS, Python changed across this versions the default method to spawn sub-processes - that also explains why it "works on Python 3.9 remotely": the remote deploy must be on a Linux or other Unix than MacOS -
Bear with me:
the default child-process creation method used to be "fork" for all Unixes - when "fork" is used, the new process is an exact copy of its parent, including all declared global variables - so the global variable user_agents
exists and is visible in the target function.
The new method for OS-X is "spawn": the new process starts-over all your project code, and re-execute all the lines, but for the lines guarded by the if __name__ == "__main__":
statement: in the child-processes, the variable __name__
contains the module actual name, as it is no longer the __main__
module of the running Python program (the original process is).
Motivations for this change apart (they are easily searchable), the fix is simple: just declare your global variables outside of the guarded block:
(...)
user_agents = open("user-agents.txt", encoding="utf-8").readlines()
if __name__ == '__main__':
if len(links) > 0:
print('Starting with the Pool count = ', PRODUCT_POOL_COUNT)
with Pool(PRODUCT_POOL_COUNT) as p:
result = p.map(check, links)
result = list(filter(None, result)) # Remove Empty
(Also, no need to go through all of this of "with open" a file if you are reading it in a single glob)