Home > Back-end >  Upgrading Python version to 3.9 on macOS now gives “variable not defined” error for some file-readin
Upgrading Python version to 3.9 on macOS now gives “variable not defined” error for some file-readin

Time:10-27

I just upgraded from Python 3.7 to 3.9.14 and it now gives a variable not defined error. The same code works fine locally and remotely where Python 3.9.2 is installed but now locally it gives an error in Python 3.9.14 version. Below is the code:

def check(url):
    result = None
    product = Product(url, user_agents)
    if product.is_connected():
        result = product.parse()
    return result
if __name__ == '__main__':
   user_agents = []
   with open('user-agents.txt', encoding='utf8') as f:
       user_agents = f.readlines()
       if len(links) > 0:
          print('Starting with the Pool count = ', PRODUCT_POOL_COUNT)
          with Pool(PRODUCT_POOL_COUNT) as p:                         
            result = p.map(check, links)
            result = list(filter(None, result))  # Remove Empty

Below is the error message:

Traceback (most recent call last):
  File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/Me/Data/Clients/App/Etsy/products/parse_product.py", line 12, in check
    product = Product(url, user_agents)
NameError: name 'user_agents' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/Me/Data/Clients/App/Etsy/products/parse_product.py", line 126, in <module>
    result = p.map(check, links)
  File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Users/Me/.pyenv/versions/3.9.14/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
NameError: name 'user_agents' is not defined

CodePudding user response:

You must be on OS-X. On that OS, Python changed across this versions the default method to spawn sub-processes - that also explains why it "works on Python 3.9 remotely": the remote deploy must be on a Linux or other Unix than MacOS -

Bear with me: the default child-process creation method used to be "fork" for all Unixes - when "fork" is used, the new process is an exact copy of its parent, including all declared global variables - so the global variable user_agents exists and is visible in the target function.

The new method for OS-X is "spawn": the new process starts-over all your project code, and re-execute all the lines, but for the lines guarded by the if __name__ == "__main__": statement: in the child-processes, the variable __name__ contains the module actual name, as it is no longer the __main__ module of the running Python program (the original process is).

Motivations for this change apart (they are easily searchable), the fix is simple: just declare your global variables outside of the guarded block:

(...)
user_agents = open("user-agents.txt", encoding="utf-8").readlines() 
if __name__ == '__main__':
   if len(links) > 0:
       print('Starting with the Pool count = ', PRODUCT_POOL_COUNT)
       with Pool(PRODUCT_POOL_COUNT) as p:                         
       result = p.map(check, links)
       result = list(filter(None, result))  # Remove Empty

(Also, no need to go through all of this of "with open" a file if you are reading it in a single glob)

  • Related