Passing DataFrame from notebook to another with pyspark-CodePudding

i'am trying to call a DataFrame that i created in notebook1 to use it in my notebook2 in Databricks Community addition with pyspark and i tried this code dbutils.notebook.run("notebook1", 60, {"dfnumber2"}) but it shows this error. py4j.Py4JException: Method _run([class java.lang.String, class java.lang.Integer, class java.util.HashSet, null, class java.lang.String]) does not exist

any help please?

CodePudding user response：

The actual problem is that you pass last parameter ({"dfnumber2"}) incorrectly - with this syntax it's a set, not the map type. You need to use syntax: {"table_name": "dfnumber2"} to represent it as a dict/map.

But if you look into documentation of dbutils.notebook.run, you will see following phrase:

To implement notebook workflows, use the dbutils.notebook.* methods. Unlike %run, the dbutils.notebook.run() method starts a new job to run the notebook.

But jobs aren't supported on the Community Edition, so it won't work anyway.

CodePudding user response：

Create a global temp view and pass the table name as argument to your next notebook.

Drnumber2.createOrReplaceGlobalTempView("dfnumber2")

dbutils.notebook.run("notebook1", 60, {table_name:"dfnumber2"})

In your notebook1 you can do

table_name= dbutils.widgets.get("table_name") Dfnumber2 = spark.sql("select * from global_temp." table_name)