Is there any way to load part of table from DataBase?-CodePudding

I need to write function that will generate report from last week. I use pd.read_sql_table to import table. Problem is that this is taking 10 seconds. This is Postgresql DB. The database has a date column.

This is my code to import table:

engine = create_engine('postgresql://user:[email protected]/DB').connect()
df = pd.read_sql_table(table_name='young', con=engine)

Is there any solution to import only that part of table that i need? Thanks for all answers.

CodePudding user response：

If you want to load just the list of dates, or just a part of columns coming from your table, you have to specify the query using the function read_query() instead of read_sql_table()

Inside the function read_query(), you can specify the sql for retrieving the data just from the desired column(columns.

here's a code snippet that should work:

engine = create_engine('postgresql://user:[email protected]/DB').connect()
query = "SELECT <date column> FROM <my table>"
df = pd.read_sql(sql=query, con=engine)

is better to use pandas Series instead of dataframes if you read data from just one column (i think that pandas automatically generates series in this case).

EDIT:

looking back at your question, if i understood correctly, you need to retrieve the data from your table from last week.

one way, keeping my example above, is structuring the query, adding a WHERE condition, and filter out the dates that are older than a week ago

here's an example:

engine = create_engine('postgresql://user:[email protected]/DB').connect()
query = """SELECT *
           FROM <my table>
           WHERE <date column> <= {date}""".format(date=<calculated_date>)
df = pd.read_sql(sql=query, con=engine)

CodePudding user response：

use chunksize as follows:

df_iterator = pd.read_sql_table(your_table_name, con=engine, chunksize=num_rows_you_want)

it returns an iterator but you can get the first batch by:

df = next(df_iterator)