Home > Software design >  Running spark.sql query in jupyter
Running spark.sql query in jupyter

Time:09-17

I am starting the jupyter notebook with

pyspark --driver-class-path /home/statspy/postgresql-42.2.23.jar --jars /home/statspy/postgresql-42.2.23.jar

I am running this in the jupyter:

import os
jardrv = '/home/statspy/postgresql-42.2.23.jar'
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', jardrv).getOrCreate()
url = 'jdbc:postgresql://127.0.0.1/dbname'
properties = {'user':'postgres', 'password':'secret'}
df = spark.read.jdbc(url=url, table='tbname', properties=properties)

then i can run:

df.printSchema()

and I get the schema.

But then I want to run queries like this:

spark.sql("""select * from tbname""")

and I get an error saying table or view tbname not found

What do I need to change to run a query with spark.sql instead of use df?

CodePudding user response:

You need to save the dataframe as tempview before using spark.sql.

df.createOrReplaceTempView("tbname")
  • Related