Home > Net >  Neo4j python API crashing on multiple queries
Neo4j python API crashing on multiple queries

Time:04-28

I'm trying to create edges from the python API of neo4j using a neo4j docker image. basically I launch the following script:

from neo4j import GraphDatabase
server='bolt://localhost:7687'
usr="neo4j"
pwd="jdl"

driver = GraphDatabase.driver(server, auth=(usr, pwd))
for line in tqdm.tqdm(big_mat[0:1000]):
        query_line=f"""MATCH (s:Sample)-[r]->(m:Mineral)
        WHERE s.id='{line[0]}' AND m.name='{line[2]}'
        SET r.amount_weighted={line[1]}
        """
        driver.session().run(query_line)

and for whatever reason it does a bunch of iterations (which is variable between about 70 and a hundred) and then crashes outputting this error which I cannot really understand..

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
Input In [257], in <cell line: 6>()
      6 for line in tqdm.tqdm(big_mat[0:1000]):
      7         query_line=f"""MATCH (s:Sample)-[r]->(m:Mineral)
      8         WHERE s.id='{line[0]}' AND m.name='{line[2]}'
      9         SET r.amount_weighted={line[1]}
     10         """
---> 11         driver.session().run(query_line)

File ~/anaconda3/envs/Jdl_geochemical/lib/python3.9/site-packages/neo4j/work/simple.py:204, in Session.run(self, query, parameters, **kwparameters)
    201     self._autoResult._buffer_all()  # This will buffer upp all records for the previous auto-transaction
    203 if not self._connection:
--> 204     self._connect(self._config.default_access_mode)
    205 cx = self._connection
    206 protocol_version = cx.PROTOCOL_VERSION

File ~/anaconda3/envs/Jdl_geochemical/lib/python3.9/site-packages/neo4j/work/simple.py:108, in Session._connect(self, access_mode)
    106 if access_mode is None:
    107     access_mode = self._config.default_access_mode
--> 108 super()._connect(access_mode)

File ~/anaconda3/envs/Jdl_geochemical/lib/python3.9/site-packages/neo4j/work/__init__.py:79, in Workspace._connect(self, access_mode)
     66     else:
     67         # This is the first time we open a connection to a server in a
     68         # cluster environment for this session without explicitly
   (...)
     71         # we shall use this database explicitly for all subsequent
     72         # actions within this session.
     73         self._pool.update_routing_table(
     74             database=self._config.database,
     75             imp_user=self._config.impersonated_user,
     76             bookmarks=self._bookmarks,
     77             database_callback=self._set_cached_database
     78         )
---> 79 self._connection = self._pool.acquire(
     80     access_mode=access_mode,
     81     timeout=self._config.connection_acquisition_timeout,
     82     database=self._config.database,
     83     bookmarks=self._bookmarks
     84 )
     85 self._connection_access_mode = access_mode

File ~/anaconda3/envs/Jdl_geochemical/lib/python3.9/site-packages/neo4j/io/__init__.py:842, in BoltPool.acquire(self, access_mode, timeout, database, bookmarks)
    840 def acquire(self, access_mode=None, timeout=None, database=None, bookmarks=None):
    841     # The access_mode and database is not needed for a direct connection, its just there for consistency.
--> 842     return self._acquire(self.address, timeout)

File ~/anaconda3/envs/Jdl_geochemical/lib/python3.9/site-packages/neo4j/io/__init__.py:715, in IOPool._acquire(self, address, timeout)
    710     # if timed out, then we throw error. This time
    711     # computation is needed, as with python 2.7, we
    712     # cannot tell if the condition is notified or
    713     # timed out when we come to this line
    714     if not time_remaining():
--> 715         raise ClientError("Failed to obtain a connection from pool "
    716                           "within {!r}s".format(timeout))
    717 else:
    718     raise ClientError("Failed to obtain a connection from pool "
    719                       "within {!r}s".format(timeout))

ClientError: {code: None} {message: None}

Am I trying to upload edges in the wrong way?

EDIT:

I found that defining the session outside the for loop the query ends succesfully. In most examples I found the former form of running the query, so maybe seeing this can help someone.

from neo4j import GraphDatabase
server='bolt://localhost:7687'
usr="neo4j"
pwd="jdl"

driver = GraphDatabase.driver(server, auth=(usr, pwd))
session=driver.session()
for line in tqdm.tqdm(big_mat[0:1000]):
        query_line=f"""MATCH (s:Sample)-[r]->(m:Mineral)
        WHERE s.id='{line[0]}' AND m.name='{line[2]}'
        SET r.amount_weighted={line[1]}
        """
        session.run(query_line)

CodePudding user response:

The problem is that sessions, just like drivers, have a lifetime you have to manage, just like opening files. If you open a driver, you have to close it afterwards. The same holds true for sessions.

Your code could look like this

from neo4j import GraphDatabase


uri = "neo4j://localhost:7687"
user = "neo4j"
password = "jdl"

driver = GraphDatabase.driver(uri, auth=(user, password))
try:
    for line in tqdm.tqdm(big_mat[0:1000]):
        query_line = f"""MATCH (s:Sample)-[r]->(m:Mineral)
        WHERE s.id='{line[0]}' AND m.name='{line[2]}'
        SET r.amount_weighted={line[1]}
        """
        session = driver.session()
        try:
            session.run(query_line)
        finally:
            session.close()
finally:
    driver.close()

or neater:

...

with GraphDatabase.driver(uri, auth=(user, password)) as driver:
    for line in tqdm.tqdm(big_mat[0:1000]):
        query_line = ...
        with driver.session() as session:
            session.run(query_line)

On a more in-depth level: opening a session and executing work on it will make it borrow a connection from the connection pool the driver maintains for you. If you keep opening sessions without closing them, you will exhaust the pool at some point as each session can hold onto up to one connection. Then, the next session that requests a connection from the pool will time out.

  • Related