I am trying to implement a auto-trading bot and want to backtest its performance on historical market data. This function mocks the real one and presents the "current" prices:
public DateTime Current = new DateTime(2022, 08, 05, 8, 0, 0);
public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
using var context = _factory.CreateDbContext();
List<Quote> quotes = new List<Quote>();
foreach (var isin in isins)
{
var stockPrice = context.HistoricalMinutePrices.Where(x => x.Stock.Isin == isin && x.Time <= Current).OrderByDescending(x => x.Time).FirstOrDefault();
if (stockPrice is null)
continue;
var quote = new Quote { Time = stockPrice.Time, Ask = stockPrice.Ask, Bid = stockPrice.Bid, Isin = isin };
quotes.Add(quote);
}
Current = Current.AddMinutes(1);
Console.WriteLine($"[{Current:g}]");
return quotes;
}
It gets the "latest" price entry for each stock in the database. "Latest" refers to the DateTime that moves by 1 minute each iteration.
Now the problem is, that I use a SQLite DB and it contains a lot of entries even for only a day of data: 100 stocks * 60 minutes * 24 hours = 144.000 entries a day. I think the OrderBy takes a lot of performance and the fact that it fires 100 single sqls. How this can be improved?
The Database is available local on the machine
CodePudding user response:
Try the following query:
public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
using var context = _factory.CreateDbContext();
var dataQuery = context.HistoricalMinutePrices
.Where(x => isins.Contains(x.Stock.Isin) && x.Time <= Current);
var query =
from d in dataQuery.Select(d => new { d.Stock.Isin }).Distinct()
from p in dataQuery.Where(p => p.Stock.Isin == d.Isin)
.OrderByDescending(p => p.Time)
.Take(1)
select new Quote
{
Time = p.Time,
Ask = p.Ask,
Bid = p.Bid,
Isin = d.Isin
};
var quotes = await query.ToListAsync();
Current = Current.AddMinutes(1);
Console.WriteLine($"[{Current:g}]");
return quotes;
}
CodePudding user response:
I found a solution that works quite well for now:
public DateTime Current = new DateTime(2022, 08, 05, 8, 0, 0);
private List<Quote> _quotesInDay;
private List<Quote> _quotesInHour;
public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
if (Current.Hour == 22)
Current = Current.AddHours(10);
if (_quotesInDay is null || _quotesInDay.First().Time.Day != Current.Day)
{
using var context = _factory.CreateDbContext();
_quotesInHour = null;
_quotesInDay = context.HistoricalMinutePrices.Where(x => x.Time.Day == Current.Day).Select(x => new Quote { Time = x.Time, Ask = x.Ask, Bid = x.Bid, Isin = x.Stock.Isin }).ToList();
}
if (_quotesInHour is null || _quotesInHour.First().Time.Hour != Current.Hour)
_quotesInHour = _quotesInDay.Where(x => x.Time >= Current.AddMinutes(-10) && x.Time < Current.AddMinutes(60)).OrderByDescending(x => x.Time).ToList();
List<Quote> quotes = new List<Quote>();
foreach (var isin in isins)
{
var quote = _quotesInHour.FirstOrDefault(x => x.Isin == isin && x.Time <= Current);
if (quote is null)
continue;
quotes.Add(quote);
}
Current = Current.AddMinutes(1);
Console.WriteLine($"[{Current:g}]");
return quotes;
}
I load the whole data for one day at the beginning of each day in memory at once. Then I chunk this data into groups of one hour. This hourly data is sorted only once by time.
This way only one sql is fired for each day (compared to ~100k before) and way less sorting happens.