Home > Enterprise >  C# / EF Core: How can the performance of this code / queries be improved?
C# / EF Core: How can the performance of this code / queries be improved?

Time:08-07

I am trying to implement a auto-trading bot and want to backtest its performance on historical market data. This function mocks the real one and presents the "current" prices:

public DateTime Current = new DateTime(2022, 08, 05, 8, 0, 0);
public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
    using var context = _factory.CreateDbContext();
    List<Quote> quotes = new List<Quote>();
    foreach (var isin in isins)
    {
        var stockPrice = context.HistoricalMinutePrices.Where(x => x.Stock.Isin == isin && x.Time <= Current).OrderByDescending(x => x.Time).FirstOrDefault();
        if (stockPrice is null)
            continue;
        var quote = new Quote { Time = stockPrice.Time, Ask = stockPrice.Ask, Bid = stockPrice.Bid, Isin = isin };
        quotes.Add(quote);   
    }
    Current = Current.AddMinutes(1);
    Console.WriteLine($"[{Current:g}]");
    return quotes;
}

It gets the "latest" price entry for each stock in the database. "Latest" refers to the DateTime that moves by 1 minute each iteration.

Now the problem is, that I use a SQLite DB and it contains a lot of entries even for only a day of data: 100 stocks * 60 minutes * 24 hours = 144.000 entries a day. I think the OrderBy takes a lot of performance and the fact that it fires 100 single sqls. How this can be improved?

The Database is available local on the machine

CodePudding user response:

Try the following query:

public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
    using var context = _factory.CreateDbContext();

    var dataQuery = context.HistoricalMinutePrices
        .Where(x => isins.Contains(x.Stock.Isin) && x.Time <= Current);
    
    var query = 
        from d in dataQuery.Select(d => new { d.Stock.Isin }).Distinct()
        from p in dataQuery.Where(p => p.Stock.Isin == d.Isin)
            .OrderByDescending(p => p.Time)
            .Take(1)
        select new Quote 
        { 
            Time = p.Time, 
            Ask = p.Ask, 
            Bid = p.Bid, 
            Isin = d.Isin 
        };

    var quotes = await query.ToListAsync();

    Current = Current.AddMinutes(1);
    Console.WriteLine($"[{Current:g}]");
    return quotes;
}

CodePudding user response:

I found a solution that works quite well for now:

public DateTime Current = new DateTime(2022, 08, 05, 8, 0, 0);
private List<Quote> _quotesInDay;
private List<Quote> _quotesInHour;
public async Task<List<Quote>> GetLatestQuotes(List<string> isins)
{
    if (Current.Hour == 22)
        Current = Current.AddHours(10);
    if (_quotesInDay is null || _quotesInDay.First().Time.Day != Current.Day)
    {
        using var context = _factory.CreateDbContext();
        _quotesInHour = null;
        _quotesInDay = context.HistoricalMinutePrices.Where(x => x.Time.Day == Current.Day).Select(x => new Quote { Time = x.Time, Ask = x.Ask, Bid = x.Bid, Isin = x.Stock.Isin }).ToList();
    }
    if (_quotesInHour is null || _quotesInHour.First().Time.Hour != Current.Hour)
        _quotesInHour = _quotesInDay.Where(x => x.Time >= Current.AddMinutes(-10) && x.Time < Current.AddMinutes(60)).OrderByDescending(x => x.Time).ToList();

    List<Quote> quotes = new List<Quote>();
    foreach (var isin in isins)
    {
        var quote = _quotesInHour.FirstOrDefault(x => x.Isin == isin && x.Time <= Current);
        if (quote is null)
            continue;
        quotes.Add(quote);
    }
    Current = Current.AddMinutes(1);
    Console.WriteLine($"[{Current:g}]");
    return quotes;
}

I load the whole data for one day at the beginning of each day in memory at once. Then I chunk this data into groups of one hour. This hourly data is sorted only once by time.

This way only one sql is fired for each day (compared to ~100k before) and way less sorting happens.

  • Related