Home > Enterprise >  How can I save a pyspark dataframe in databricks to SAP Hana (SAP Data Warehouse Cloud) using hdbcli
How can I save a pyspark dataframe in databricks to SAP Hana (SAP Data Warehouse Cloud) using hdbcli

Time:10-25

I need to push some data from Databricks on AWS to SAP Data Warehouse cloud, and have been encouraged to use the python hdbcli (https://pypi.org/project/hdbcli/). The only documentation I have been able to find is the one in pypi, which is quite scarce. I can see an example of how to push individual rows to a sql table, but I have found no examples of how to save a pyspark dataframe to a table in SAP Data Warehouse cloud.

Documentation examples:

sql = 'INSERT INTO T1 (ID, C2) VALUES (:id, :c2)'
cursor = conn.cursor()
id = 3
c2 = "goodbye"
cursor.execute(sql, {"id": id, "c2": c2})
# returns True
cursor.close()

I have tried the following in my data bricks notebook:

df.createOrReplaceTempView("final_result_local")
sql = "INSERT INTO final_result SELECT * FROM final_result_local"
cursor.execute(sql)
cursor.close()

After this I got the following error:

invalid table name: Could not find table/view FINAL_RESULT_LOCAL in schema DATABRICKS_SCHEMA

It seems df.createOrReplaceTempView created the sql table in a different context to the one called by hdbcli, and I don't know how to push the local table to sap data warehouse cloud. Any help would be much appreciated.

CodePudding user response:

I disagree with using hdbcli. Instead look into connecting from Spark directly, this instruction should be helpful.

CodePudding user response:

You should consider using the Python machine learning client for SAP HANA (hana-ml). You can think of it as being an abstraction layer on top of hdbcli. The central object to send and retrieve data is the HANA dataframe, which behaves similar to a Pandas dataframe, but is persisted on database side (i.e. this can be a table).

For your scenario, you should be able to create a HANA dataframe and thus a table using function create_dataframe_from_spark() (see documentation).

Regarding the direct use of hdbcli, you can find the full documentation here (also linked on PyPi).

  • Related