Home > Software design >  Missing records when doing pagination with LINQ and Entity Framework Core
Missing records when doing pagination with LINQ and Entity Framework Core

Time:11-19

I have a table like below which contains a lot of data and would like to get chunk by chunk:

ProcessData

ProcessDataId      ProcessId       IsProcessed
----------------------------------------------
1                  100              0
2                  100              0
3                  100              0
4                  100              0
5                  100              0
.
.
167                100              0

I am getting 50 data at a time, processing it and updating IsProcessed value to 1 in the database. Now what is happening is I am able to process 150 data but I am not getting the last 17 records in the while loop because of this below call :

// update IsProcessed column in database for 50 records
await BulkUpdateProcessData(processDataChunk); 

When I call 'BulkUpdateProcessData' method after the while loop and update all the 167 records at the end then it works fine but I don't want to hold all the data in the memory for a long and then update all the data at the end.

Code:

var getUnProcessedDataQuery = context.ProcessData.Where(p=>p.ProcessId==100 && !IsProcessed).OrderBy(p=>p.ProcessDataId);

int unProcessedDataCount = getUnProcessedDataQuery.Count();

int processedBatch = 0;

while(processedBatch < unProcessedDataCount)
{
    var processDataChunk = getUnProcessedDataQuery.ToPagedList(processedBatch,50,unProcessedDataCount);
    foreach(var data in processDataChunk)
    {
       //process each Data
       try
       {
          //process data
          data.IsProcessed = true;
       }
       catch(Exception ex)
       { 
          data.IsProcessed = false
       }
    }

    // update IsProcessed column in database for 50 records
    await BulkUpdateProcessData(processDataChunk); 
    processedBatch = processedBatch   50;
}

public statis class PagingExtensions
{
   public static IPagedList<T> ToPagedList<T>(this IQueryable<T> sourcce, int skip,int pageSize,int totalCount)
   {
       pageSize = Math.Max(pageSize,1);
       
       var data = new List<T>();
       data.AddRange(source.Skip(skip).Take(pageSize).toList());
       return new PagedList<T>(data,skip,pageSize,totalCount);
    }
}

public class PagedList<T> : List<T>,IPagedList<T>
{
   public PagedList(IList<T> source, int pageIndex, int pageSize, int? totalCount = null)
   {
      pageSize  = Math.Max(pageSize,1)
      data.AddRange(totalCount!=null ? source : source.Skip(pageIndex * pageSize).Take(pageSize).toList());
    }
}

How can I fix this pagination issue?

Note: this is a background service which is getting the data in chunks and processing it.

CodePudding user response:

I wonder how it does update that much - since you are updating IsProcessed in the BulkUpdateProcessData the next getUnProcessedDataQuery.ToPagedList should ignore bunch of unprocessed data. Just take the 50 records with query and that's it:

var processDataChunk = await getUnProcessedDataQuery.Take(50).ToListAsync();

UPD

To filter out already tried items you can add filtering by max processed ID:

var maxId = int.MinValue; // or nullable int
while(...)
{
    var processDataChunk = await getUnProcessedDataQuery
        .Where(p => p.ProcessDataId > maxId)
        .Take(50)
        .ToListAsync();
    maxId = ...; // find max in chunk
    // ...
}
  • Related