I need to push some data from Databricks on AWS to SAP Data Warehouse cloud, and have been encouraged to use the python hdbcli (https://pypi.org/project/hdbcli/). The only documentation I have been able to find is the one in pypi, which is quite scarce. I can see an example of how to push individual rows to a sql table, but I have found no examples of how to save a pyspark dataframe to a table in SAP Data Warehouse cloud.
Documentation examples:
sql = 'INSERT INTO T1 (ID, C2) VALUES (:id, :c2)'
cursor = conn.cursor()
id = 3
c2 = "goodbye"
cursor.execute(sql, {"id": id, "c2": c2})
# returns True
cursor.close()
I have tried the following in my data bricks notebook:
df.createOrReplaceTempView("final_result_local")
sql = "INSERT INTO final_result SELECT * FROM final_result_local"
cursor.execute(sql)
cursor.close()
After this I got the following error:
invalid table name: Could not find table/view FINAL_RESULT_LOCAL in schema DATABRICKS_SCHEMA
It seems df.createOrReplaceTempView created the sql table in a different context to the one called by hdbcli, and I don't know how to push the local table to sap data warehouse cloud. Any help would be much appreciated.
CodePudding user response:
I disagree with using hdbcli
. Instead look into connecting from Spark directly, this instruction should be helpful.
CodePudding user response:
You should consider using the Python machine learning client for SAP HANA (hana-ml
). You can think of it as being an abstraction layer on top of hdbcli
. The central object to send and retrieve data is the HANA dataframe, which behaves similar to a Pandas dataframe, but is persisted on database side (i.e. this can be a table).
For your scenario, you should be able to create a HANA dataframe and thus a table using function create_dataframe_from_spark()
(see documentation).
Regarding the direct use of hdbcli
, you can find the full documentation here (also linked on PyPi).