I have a Pandas dataframe df
which I want to push to a relational database as a table. I setup a connection object (<Connection>
) using SQLAlchemy (pyodbc is the connection engine), and called the command
df.to_sql(<Table_Name>, <Connection>)
which I was able to confirm was written as a table to the desired relational database by visual examination of it in SQL Server Management Studio (SSMS). But in the left-hand-side list of databases and their tables in SSMS I see that it has named it
<Sender_Server>\<Username>.<Table_Name>
where <Sender_Server>
is (I think) related to the name of the server I ran the Python command from, <Username>
is my username on that server, and <Table_Name>
is the desired table name.
When I right-click on the table and select to query the top one thousand rows I get a query of the form
SELECT * FROM [<Data_Base_Name>].[<Sender_Server>\<Username>].[<Table_Name>]
which also has the <Sender_Server>\<Username>
info in it. The inclusion of <Sender_Server>\<Username>
is undesired behaviour in our use case.
How can I instead have the data pushed such that
SELECT * FROM [<Data_Base_Name>].[<Table_Name>]
is the appropriate query?
CodePudding user response:
By default, .to_sql()
assumes the default schema for the current user unless schema="schema_name"
is provided. Say, for example, the database contains a table named dbo.thing
and the database user named joan
has a default schema named engineering
. If Joan does
df.to_sql("thing", engine, if_exists="append", index=False)
it will not append to dbo.thing
but will instead try to create an engineering.thing
table and append to that. If Joan wants to append to dbo.thing
she needs to do
df.to_sql("thing", engine, schema="dbo", if_exists="append", index=False)