Home > Back-end >  How to avoid duplicates when inserting data using OleDB and Entity Framework?
How to avoid duplicates when inserting data using OleDB and Entity Framework?

Time:11-03

I need to export Excel report data into the company db but my code just reads and inserts without checking for duplicates, i tried AddOrUpdate() but i couldn't make it work.

Any ideas on how to go through the datareader results and filter already existing IDs so they are not inserted again?

DataView ImportarDatosSites(string filename)
        {
            string conexion = string.Format("Provider = Microsoft.ACE.OLEDB.12.0; Data Source={0}; Extended Properties= 'Excel 8.0;HDR=YES'" ,filename );
            using (OleDbConnection connection = new OleDbConnection(conexion))
            {
                connection.Open();
                OleDbCommand command = new OleDbCommand("SELECT * FROM [BaseSitiosTelemetria$]", connection);
                OleDbDataAdapter adaptador = new OleDbDataAdapter { SelectCommand = command };
                DataSet ds = new DataSet();
                adaptador.Fill(ds);
                DataTable dt = ds.Tables[0];
          
                using (OleDbDataReader dr = command.ExecuteReader())
                {
                    while (dr.Read())
                    {
                        var SiteID     = dr[1];
                        var ID_AA_FB   = dr[2];
                        var Address    = dr[3];
                        var CreateDate = dr[5];
                        var Tipo       = dr[7];
                        var Measures   = dr[9];
                        var Latitud    = dr[10];
                        var Longitud   = dr[11];

                        SitesMtto s = new SitesMtto();

                        s.siteIDDatagate      = SiteID.ToString();
                        s.idFieldBeat         = ID_AA_FB.ToString();
                        s.addressDatagate     = Address.ToString();
                        s.createDateDatagate  = Convert.ToDateTime(CreateDate);
                        s.typeDevice          = Tipo.ToString();
                        s.MeasuresDevice      = Measures.ToString();
                        if (Latitud.ToString() != "" && Longitud.ToString() != "")
                        {
                            s.latitudeSite  = Convert.ToDouble(Latitud);
                            s.longitudeSite = Convert.ToDouble(Longitud);
                        }

                      db.SitesMtto.Attach(s);
                      db.SitesMtto.Add(s);
                      db.SaveChanges();    
                    }

                    connection.Close();
                    return ds.Tables[0].DefaultView;
                }
            }
        }

CodePudding user response:

one way is to setup a try catch block and then set your primary key index using tsql. when a constraint error occurs then it will throw an database error which you can catch.

CodePudding user response:

When it comes to an import process from an external source, I recommend using a Staging table approach. Dump the raw data from Excel/file into a clean staging table. (executing a TRUNCATE TABLE script against the staging table first) From there you can perform a query with a join against the real data table to detect and ignore/update possible duplicates, inserting real rows for any staged row that doesn't already have a corresponding value.

Depending on the number of rows I would recommend batching up the read and insert. You also don't need to call both Attach() and Add(), simply adding the item to the DbSet is sufficient:

Step 1: flush the staging table using a db.Database.ExecuteSqlCommand("TRUNCATE TABLE stagingSitesMtto");

Step 2: Open the data reader and bulk-insert the rows into the stagingSitesMtto table. This assumes that the Excel/file source does not include duplicate rows within it.

Step 3: Query your stagingSitesMtto joining your SitesMtto table on the PK/unique key. This is arguably a bit complex as Join is normally used to perform an INNER JOIN but we want an OUTER JOIN since we will be interested in StagingSites that have no corresponding site.

var query = db.StagingSitesMtto
    .GroupJoin(db.SitesMto,
       staging => staging.SiteID,
       site => site.siteIDDatagate,
       (staging, site) => new 
       {
           Staging = staging,
           Site = site
       })
    .SelectMany(group => group.Site.DefaultIfEmpty(),
        (group, site) => new 
        {
            Staging = group.Staging,
            IsNew = site == null
        })
    .Where(x => x.IsNew)
    .Select(x => x.Staging)
    .ToList(); // Or run in a loop with Skip and Take

This will look to select all staging rows that do not have a corresponding real row. From there you can create new SitesMtto entities and copy the data across from the staging row, add it to the db.Sites, and save. If you want to update rows as well as insert then you can return the Staging and Site along with the IsNew flag and update the .Site using the values from .Staging. With change tracking enabled, the existing Site will be updated on SaveShanges if values were altered.

Disclaimer: The above code wasn't tested, just written from memory and reference for the Outer Join approach. see: How to make LEFT JOIN in Lambda LINQ expressions

Hopefully that gives you something to consider for handling imports.

  • Related