Home > database >  this code takes a 2hrs to compare and sort 20,000 items each, is there a better way to write this c#
this code takes a 2hrs to compare and sort 20,000 items each, is there a better way to write this c#

Time:09-30

I am trying to sort all the updated item in DataTableA, by coloring the item that has not been completely updated, and removing the item that has been updated completely from the DataTable. both The item that has been updated completely and the incomplete updated item are in "managed" table in the database, the discharge date will be null if it has not been completely updated.

Below code works but it can take all day for the page to run. This is C# webform.

The code below is writing on my code behind file:

 foreach (GridDataItem dataItem in RadGrid1.Items)
  {
    var panu = dataItem["Inumber"];
    var panum = panu.Text;
    var _cas = db.managed.Any(b =>
        b.panumber == panum && b.dischargedate != null);                
    var casm = db.managed.Any(b => 
        b.panumber == panum && b.dischargedate == null);
   if (_cas == true)
     {
        dataItem.Visible = false;
        
      }
   if (casm == true)
     {
        dataItem.BackColor = Color.Yellow;
     }

 }

CodePudding user response:

Well, 900 items might be worth simply fetching to a list in memory and then process that. It will definitely be faster, although it consumes more memory.

You can do something like this (assuming the type of managed is Managed):

List<Managed> myList = db.managed.ToList();

That will fetch the whole table.

Now replace your code with:

foreach (GridDataItem dataItem in RadGrid1.Items)
  {
    var panu = dataItem["Inumber"];
    var panum = panu.Text;
    var _cas = myList .Any(b =>
        b.panumber == panum && b.dischargedate != null);                
    var casm = myList .Any(b => 
        b.panumber == panum && b.dischargedate == null);
   if (_cas == true)
     {
        dataItem.Visible = false;
        
      }
   if (casm == true)
     {
        dataItem.BackColor = Color.Yellow;
     }

 }

You should see a huge performance approvement.

Another thing: You don't mention what database you're using, but you should make sure the panumber column is properly indexed.

CodePudding user response:

As mentioned in the comment, each call to db.managed.Any will create a new SQL query.

There are various improvements you can make to speed this up:

  1. First, you don't need to call db.managed.Any twice inside the loop, if it's checking the same unique entity. Call it just once and check dischargedate. This alone with speed up the loop 2x.

     // one database call, fetching one column
     var dischargedate = db.managed
         .Select(x => x.dischargedate)
         .FirstOrDefault(b => b.panumber == panum);
    
     var _cas = dischargedate != null;
     var casm = dischargedate == null;
    
  2. If panumber is not a unique primary key and you don't have a sql index for this column, then each db.managed.Any call will scan all items in the table on each call. This can be easily solved by creating an index with panum and dischargedate, so if you don't have this index, create it.

  3. Ideally, if the table is not huge, you can just load it all at once. But even if you have tens of millions of records, you can split the loop into several chunks, instead of repeating the same query over and over again.

  4. Consider using better naming for your variables. _cas and casm are a poor choice of variable names.

    Pro tip: Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

So if you don't have hundreds of thousands of items, here is the simplest fix: load panumber and discharge values for all rows from that table into memory, and then use a dictionary to instantly find the items:

// load all into memory
var allDischargeDates = await db.managed
    .Select(x => new { x.panumber, x.dischargedate })
    .ToListAsync(cancellationToken);

// create a dictionary so that you can quickly map panumber -> dischargedate
var dischargeDateByNumber = dbItems
    .ToDictionary(x => x.panumber, x => x.dischargedate);
    
foreach (var dataItem in RadGrid1.Items)
{
    var panu = dataItem["Inumber"];
    var panum = panu.Text;
    
    // this is very fast to check now
    if (!dischargeDateByNumber.TryGetValue(panum, out DateTime? dischargeDate))
    {
        // no such entry - in this case your original code will just skip the item
        return;            
    }
    
    if (dischargeDate != null)
    {
        dataItem.Visible = false;
    }
    else
    {
        dataItem.BackColor = Color.Yellow;
    }
}

If the table is huge and you only want to load certain items, you would do:

// get the list of numbers to fetch from the database
// (this should not be a large list!)
var someList = RadGrid1
   .Items
   .Select(x => x["Inumber"].Text)
   .ToList();

// load these items into memory
var allDischargeDates = await db.managed
    .Where(x => someList.Contains(x.panumber))
    .Select(x => new { x.panumber, x.dischargedate })
    .ToListAsync(cancellationToken);

But there is a limit on how large someList can be (you don't want to run this query for a list of 200 thousand items).

  • Related