Home > Software design >  How to avoid adding duplicate data from CSV file to SQL Server database. Using CSVHelper and C# Blaz
How to avoid adding duplicate data from CSV file to SQL Server database. Using CSVHelper and C# Blaz

Time:11-14

I have my database table named 'JobInfos' in SQL Server which contains many columns.

JobID - (int) auto populates incrementing value when data added OrgCode - (string) OrderNumber - (int) WorkOrder - (int) Customer - (string) BaseModelItem - (string) OrdQty - (int) PromiseDate - (string) LineType -(string)

This table gets written to many times a day using a Blazor application with Entity Framework and CSVHelper. This works perfectly. All rows from the CSV file are added to the database.

if (fileExist)
    {
        using (var reader = new StreamReader(@path))
        using (var csv = new CsvReader(reader, config))
        {
            var records = csv.GetRecords<CsvRow>().Select(row => new JobInfo()
                {
                    OrgCode = row.OrgCode,
                    OrderNumber = row.OrderNumber,
                    WorkOrder = row.WorkOrder,
                    Customer = row.Customer,
                    BaseModelItem = row.BaseModelItem,
                    OrdQty = row.OrdQty,
                    PromiseDate = row.PromiseDate,
                    LineType = row.LineType,
                });

        using (var db = new ApplicationDbContext())
        {
           while (!reader.EndOfStream)
               {
                   if (lineNumber != 0)
                   {
                       db.AddRange(records.ToList());
                       db.SaveChanges();
                   }

                   lineNumber  ;
               }

               NavigationManager.NavigateTo("/", true);
        }

    }

As these multiple CSV files can contain rows that may already be in the database table, I am getting duplicate records when the table is read from, which causes the users to delete all the newer duplicate rows manually to only keep the original entry.

I have no control over the CSV files or their creation. I am trying to only add rows that contain new data based on the WorkOrder number which can not be the same as any others.

I found another post here on StackOverflow which helps but I am stuck with a remaining error I can't figure out.

The Helpful post

I changed my code here...

if (lineNumber != 0)
    {
        var recordworkorder = records.Select(x => x.WorkOrder).ToList();

        var workordersindb = db.JobInfos.Where(x =>  recordworkorder.Contains(x.WorkOrder)).ToList();

        var workordersNotindb = records.Where(x => !workordersindb.Contains(x.WorkOrder));

        db.AddRange(records.ToList(workordersNotindb));

        db.SaveChanges();
}

but this line...

var workordersNotindb = records.Where(x => !workordersindb.Contains(x.WorkOrder));`

throws an error at the end (x.WorkOrder) - CS1503 Argument 1: cannot convert from 'int' to 'DepotQ4.Data.JobInfo'

WorkOrder is an int JobID is the Primary Key and an int Every record in the table must have a unique WorkOrder

I am not sure what I am not seeing. Could use some help here please?

CodePudding user response:

Your variable workordersindb is a List<JobInfo>. So when you try to select from records.Where(x => !workordersindb.Contains(x.WorkOrder)) you are trying to match the list of JobInfo in workordersindb to the int of x.WorkOrder. workordersindb needs to be a List<int> in order to be able to use it with the Contains. records would have had the same issue, but you solved it by creating the variable recordworkorder and using records.Select(x => x.WorkOrder) to get a List<int>.

if (lineNumber != 0)
{
    var recordworkorder = records.Select(x => x.WorkOrder).ToList();

    var workordersindb = db.JobInfos.Where(x =>  recordworkorder.Contains(x.WorkOrder)).Select(x => x.WorkOrder).ToList();

    var workordersNotindb = records.Where(x => !workordersindb.Contains(x.WorkOrder));

    db.JobInfos.AddRange(workordersNotindb);

    db.SaveChanges();
}
  • Related