I need to write function that will generate report from last week. I use pd.read_sql_table to import table. Problem is that this is taking 10 seconds. This is Postgresql DB. The database has a date column.
This is my code to import table:
engine = create_engine('postgresql://user:[email protected]/DB').connect()
df = pd.read_sql_table(table_name='young', con=engine)
Is there any solution to import only that part of table that i need? Thanks for all answers.
CodePudding user response:
If you want to load just the list of dates, or just a part of columns coming from your table, you have to specify the query using the function read_query() instead of read_sql_table()
Inside the function read_query(), you can specify the sql for retrieving the data just from the desired column(columns.
here's a code snippet that should work:
engine = create_engine('postgresql://user:[email protected]/DB').connect()
query = "SELECT <date column> FROM <my table>"
df = pd.read_sql(sql=query, con=engine)
is better to use pandas Series instead of dataframes if you read data from just one column (i think that pandas automatically generates series in this case).
EDIT:
looking back at your question, if i understood correctly, you need to retrieve the data from your table from last week.
one way, keeping my example above, is structuring the query, adding a WHERE condition, and filter out the dates that are older than a week ago
here's an example:
engine = create_engine('postgresql://user:[email protected]/DB').connect()
query = """SELECT *
FROM <my table>
WHERE <date column> <= {date}""".format(date=<calculated_date>)
df = pd.read_sql(sql=query, con=engine)
CodePudding user response:
use chunksize as follows:
df_iterator = pd.read_sql_table(your_table_name, con=engine, chunksize=num_rows_you_want)
it returns an iterator but you can get the first batch by:
df = next(df_iterator)