Home > Back-end >  Using a buffer to write a psycopg3 copy result through pandas
Using a buffer to write a psycopg3 copy result through pandas

Time:12-17

Using psycopg2, I could write large results as CSV using copy_expert and a BytesIO buffer like this with pandas:

copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"

buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)

However, I can't figure out how to replace the buffer in copy_expert with psycopg3's new copy command. Has anyone figured out a way to do this?

CodePudding user response:

Data and table setup:

cat test.csv                                                                                                                                                            
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"

create table test_csv (id integer, mail varchar, name varchar);

import psycopg

with open('test.csv') as f:
    with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
        while data := f.read(1000):
            copy.write(data)
con.commit()

Using a buffer:

buffer = BytesIO()
with cur.copy('copy test_csv to stdout') as copy:
    for data in copy:
        buffer.write(data)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel('test_csv.xlsx')

Since you are using Pandas there is also:


from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:@localhost:5432/test')
pd.read_sql('select * from test_csv', engine).to_excel('test_csv.xlsx')

CodePudding user response:

In psycopg3, you can use the copy_to method of the asyncpg connection object to achieve the same result as copy_expert using a BytesIO buffer. Here is an example of how you can use copy_to to write the results of a SELECT query to a CSV file using a BytesIO buffer:

import asyncio
import asyncpg
import pandas as pd
from io import BytesIO

async def main():
    # Connect to the database
    conn = await asyncpg.connect(user='user', password='password',
                                 host='host', port=5432, database='mydatabase')

    # Create the SELECT query
    select_query = "SELECT * FROM big_table"

    # Create the COPY TO query
    copy_sql = "COPY ({}) TO STDOUT CSV".format(select_query)

    # Create a BytesIO buffer
    buffer = BytesIO()

    # Use the copy_to method to write the results of the SELECT query to the buffer
    await conn.copy_to(copy_sql, buffer)

    # Seek to the beginning of the buffer
    buffer.seek(0)

    # Read the contents of the buffer into a pandas DataFrame
    df = pd.read_csv(buffer, engine="c")

    # Write the DataFrame to an Excel file
    df.to_excel("output.xlsx")

await main()

This code will execute the SELECT query, write the results to a BytesIO buffer as a CSV file, read the contents of the buffer into a pandas DataFrame, and then write the DataFrame to an Excel file.

Note that the copy_to method is asynchronous, so you will need to use it within an async function and run it using an event loop.

I used the new OpenAi GPT3 to help answer this question

  • Related