I need to improve the performance of my insert in my C# application. I first go out and get data from a view. Then I go through a FOREACH
loop to insert into a table. I have over 200,000 records that I am working with and it takes an ridiculous amount of time to perform this task. I know the SaveChanges
is a round trip to the database but I'm not sure how to get around this. Is there something I can do to improve the time?
var values = db.TodaysAirs.ToList();
foreach (TodaysAir x in values)
{
//check to see if this is a new value or one that needs to be updated
var checkForNew = db.TodaysAirValues
.Where(m => m.ID == x.ID);
//new record
if (checkForNew.Count() == 0)
{
TodaysAirValue newRecord = new TodaysAirValue();
newRecord.ID = x.ID;
newRecord.Logger_Id = x.Logger_Id;
newRecord.SiteName = x.SiteName;
newRecord.Latitude = x.Latitude;
newRecord.Longitude = x.Longitude;
newRecord.Hour = x.Hour;
newRecord.Parameter = x.Parameter;
newRecord.Stan = x.Stan;
newRecord.Units = x.Units;
newRecord.InstrumentType = x.InstrumentType;
newRecord.NowCast = x.NowCast;
newRecord.AQIValue = x.AQIValue;
newRecord.HealthCategory = x.HealthCategory;
newRecord.Hr24Avg = x.Hr24Avg;
newRecord.Hr24Max = x.Hr24Max;
newRecord.Hr24Min = x.Hr24Min;
newRecord.SID = DateTime.Now;
db.TodaysAirValues.Add(newRecord);
db.SaveChanges();
// CallJenkinsJob();
}
}
CodePudding user response:
The goal should be to run a single raw SQL statement that will look something very much like this:
INSERT INTO TodaysAirValues
(ID, Logger_id, SiteName, Latitude, Longitude, Hour, Parameter,
Stan, Units, InstrumentType, NowCast, AQIValue, HealthCategory,
Hr24Avg, Hr24Max, Hr24Min, SID)
SELECT ta.ID, ta.Logger_id, ta.SiteName, ta.Latitude, ta.Longitude,
ta.Hour, ta.Parameter, ta.Stan, ta.Units, ta.InstrumentType,
ta.NowCast, ta.AQIValue, ta.HealthCategory, ta.Hr24Avg,
ta.Hr24Max, ta.Hr24Min, current_timestamp
FROM TodaysAirs ta
LEFT JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE tav.ID IS NULL
This might not have all the table or column names exactly right, if there are any differences from the EF mapping with the database. You might also get it to go even a little faster using NOT EXISTS()
rather than the LEFT JOIN WHERE NULL
technique.
I also see this:
if the Count is greater than 0 it checks to see if any changes where made and if so update the record.
In that case, you can still get this down to just two SQL commands if you precede (run this one first!) the INSERT above with an UPDATE that looks something like this:
UPDATE tav
SET tav.ID = ta.ID, tav.Logger_ID = ta.Logger_id
-- etc...
FROM TodaysAirs ta
INNER JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE (
-- compare here to decide if the record needs to update or not
)
Unfortunately I don't have enough info about what you want that look like to give you the full code.
CodePudding user response:
if I understand correctly, you can try to use linq to SQL that might use OUTER JOIN
on those tables then get rows which might need to be inserted.
we can use DbContext.AddRange
then call SaveChanges
var insertedList = (from air in db.TodaysAirs
join val in db.TodaysAirValues on val.ID equals air.ID into Details
from m in Details.DefaultIfEmpty()
where m == null
select air).ToList();
db.TodaysAirValues.AddRange(insertedList.Select(air=>
new TodaysAirValue()
{
ID = air.ID,
Logger_Id = air.Logger_Id,
SiteName = air.SiteName,
Latitude = air.Latitude,
Longitude = air.Longitude,
Hour = air.Hour,
Parameter = air.Parameter,
Stan = air.Stan,
Units = air.Units,
InstrumentType = air.InstrumentType,
NowCast = air.NowCast,
AQIValue = air.AQIValue,
HealthCategory = air.HealthCategory,
Hr24Avg = air.Hr24Avg,
Hr24Max = air.Hr24Max,
Hr24Min = air.Hr24Min,
SID = DateTime.Now
})
);
db.SaveChanges();