I use 2 different methods to query data in DolphinDB Python API. The first one uses the loadTableBySQL
method, and the second uses the select
method:
s_db=np.datetime64(start_date.replace('.','-'))
e_db=np.datetime64(end_date.replace('.','-'))
s=time.time()
#method 1: 0.017m
stk_price = session.loadTableBySQL(tableName="KdayA11", dbPath="dfs://kline",sql=r"select * from KdayAll where trade_dt>={s},trade_dt<={e}".format(s=start_date,e=end_date))
#method 2: 0.00018m
tb=session.loadTable(tableName="KdayA11", dbPath="dfs://kline")
session.upload({'s_db':s_db})
session.upload({'e_db':e_db})
stk_price=tb.select("*").where('trade_dt>-s_db').where('trade_dt<=e_db')
e=time.time()
Why does the second method exhibit better performance by 100 times than the first?
CodePudding user response:
In fact, neither of the methods retrieves data from DolphinDB server because you need to use toDF()
to download the result from server to Python.
In your script, method 1 uses loadTableBySQL
to execute the query on the server side but does not download the result; while method 2 only concatenates the script which is actually not executed. That’s why the 2 methods show such performance difference.