I am trying to save roughly 20,000 records and it takes an unforgivable amount of time to finish. What is the more optimal way of achieving this?
Here's what I currently have:
public async Task SaveOrUpdateItemAsync(List<DepartmentalizedItem> departmentalizedItems)
{
using(WarehouseContext dbContext = new WarehouseContext())
{
using(SemaphoreSlim throttler = new SemaphoreSlim(20))
{
var tasks = departmentalizedItems.Select(async item =>
{
await throttler.WaitAsync();
if (item.PK_DepartmentalizedItemId == 0)
dbContext.DepartmentalizedItems.Add(item);
else
{
var deptItem = await dbContext.DepartmentalizedItems.FindAsync(item.PK_DepartmentalizedItemId);
dbContext.Entry(deptItem).CurrentValues.SetValues(item);
}
throttler.Release();
});
await Task.WhenAll(tasks);
}
await dbContext.SaveChangesAsync();
}
}
I've also tried Parallel.ForEach but I'm having synchronization issues.
Thank you.
CodePudding user response:
For Insert
You can use AddRange
method to insert bulk data to db. It will not take that much time to save those in db.
If you are using Add
, it is resizing the inner array gradually as needed (doubling), from the default starting size of 10 (IIRC).
AddRange
checks size of added items and increases size of internal array only once.
For Update
You can use UpdateRange
to update multiple records at single call. It will save your time to proceed the query.
eg.
public class abc
{
public int propa {get;set;}
public string propb {get;set;}
}
public bool AddUpdate(List<abc> abc)
{
List<abc> added=new List<abc>();
List<abc> updated=new List<abc>();
foreach(var item in abc)
{
if(item.propa==0) // I considered that if propa has no value then it's new record to be added else updated
{
added.Add(item);
}
else
{
updated.Add(item)
}
}
if(added?.Any()??false)
{
_dbContext.abc.AddRange(added);
}
if(updated?.Any()??false)
{
_dbContext.abc.UpdateRange(added);
}
_dbContext.SaveChanges();
return true;
}
CodePudding user response:
I would recommend some profiling to see what part is actually taking time. I would also recommend posting times and the size of the data you are adding to be able to estimate how reasonable the times are. "unforgivable amount of time" will depend on context and patience. See Fastest way of inserting where the top post inserts 500k items in ~4 minutes.
A possible issue might be the FindAsync, if it needs to run a query for each item I would expect it to be quite slow. You could avoid this by querying all the items at once, something like:
var ids = departmentalizedItems.Select(i => i.PK_DepartmentalizedItemId).ToList();
var itemsById = dbContext.DepartmentalizedItems.Where(i => ids.Contains(i)).ToDictionary(i => i.PK_DepartmentalizedItemId, i => i);
I would also get rid of the sempaphore and change it to a regular for-loop. dbContext
is not thread safe, so whatever you are doing will not make it run in parallel, and you are only making your code harder to understand for no real gain.
Also note that Async
will probably not make your code faster in any way. It is meant to hide latency, not improve performance. I would also consider splitting your items into chunks, and add some way to report progress to the user.