I am starting the jupyter notebook with
pyspark --driver-class-path /home/statspy/postgresql-42.2.23.jar --jars /home/statspy/postgresql-42.2.23.jar
I am running this in the jupyter:
import os
jardrv = '/home/statspy/postgresql-42.2.23.jar'
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', jardrv).getOrCreate()
url = 'jdbc:postgresql://127.0.0.1/dbname'
properties = {'user':'postgres', 'password':'secret'}
df = spark.read.jdbc(url=url, table='tbname', properties=properties)
then i can run:
df.printSchema()
and I get the schema.
But then I want to run queries like this:
spark.sql("""select * from tbname""")
and I get an error saying table or view tbname not found
What do I need to change to run a query with spark.sql
instead of use df?
CodePudding user response:
You need to save the dataframe as tempview before using spark.sql
.
df.createOrReplaceTempView("tbname")