Home > Software design >  "libclntsh.so: cannot open shared object file in ubuntu to run python program in Spark Cluster
"libclntsh.so: cannot open shared object file in ubuntu to run python program in Spark Cluster

Time:02-25

I have the Python program that works without any issue locally. But when I want to run it in Spark cluster I receive the error about libclntsh.so, the cluster has two nodes.

To explain more, to run the program in the cluster, first I set Master IP Address in spark-env.sh like this:

  export SPARK_MASTER_HOST=x.x.x.x

Then just write IP of slave nodes to $SPARK_HOME/conf/workers. After that, first I run Master with this line:

  /opt/spark/sbin/start-master.sh

Then run Slaves:

  /opt/spark/sbin/start-slaves.sh

Next I check that SPARK UI is up. So, I run the program in Master Node like this:

  /opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py  

When the above command is run, I receive this error:

   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
   File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
   for k, v in iterator:
   File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
   File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
    df_group = connection.read_sql(query_cnt)
   File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
   self.connect()
   File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
     self.conn = cx_Oracle.connect(db_url)
     cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: 
     "libclntsh.so: cannot open shared object file: No such file or directory". 

I set this Environment Variables in ~/.bashrc:

    export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
    export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
    export PATH=$ORACLE_HOME:$PATH
    export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
    export SPARK_HOME=/opt/spark
    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
    export PATH=$PATH:$JAVA_HOME/bin
    export PYSPARK_PYTHON=/usr/bin/python3
    export PYSPARK_HOME=/usr/bin/python3.8
    export PYSPARK_DRIVER_PYTHON=python3.8

Would you please guide me what is wrong?

Any help would be appreciated.

CodePudding user response:

Problem solved. According to the TroubleShooting link, first I create a file InstantClient.conf in /etc/ld.so.conf.d/ PATH and write the path to the Instant Client directory in it.

  # instant client Path
  /usr/share/oracle/instantclient_19_8

Finally, I run this command:

  sudo ldconfig

Then I run spark-submit and it work without the error on InstantClient.

Hope it was helpful for others.

  • Related