How to even start a basic query in databricks using python?
The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.
I started like below
%python
import pandas as pd
df = pd.read_sql('select * from databasename.tablename')
and got below error
TypeError: read_sql() missing 1 required positional argument: 'con'
So I tried update
%python
import pandas as pd
import pyodbc
odbc_driver = pyodbc.drivers()[0]
conn = pyodbc.connect(odbc_driver)
df = pd.read_sql('select * databasename.tablename', con=conn)
and I got below error
ModuleNotFoundError: No module named 'pyodbc'
Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.
CodePudding user response:
You should use directly spark's SQL facilities:
my_df = spark.sql('select * FROM databasename.tablename')