Home > Blockchain >  EF Core allow attaching entity multiple times
EF Core allow attaching entity multiple times

Time:01-06

This is I guess a bit more sophisticated. I have this model successfully created with EF Core 7 (but I guess it's for all Core versions same behavior) by a code model.

enter image description here

The culprit is the entity SubJourney that appears as child of TrustFrameworkPolicies and as a child of Candidates. I can create the model and the database looks fine in regard to the created schema.

I need to add the data in one step, because in real life it's one single XML import.

However, adding entities has limitations. Let's assume this code to add the data in one step:

var guid = Guid.NewGuid();
var sj1 = new SubJourney  {
  DbKey = guid,
  Type = SubJourneyTYPE.Transfer
};
var sj2 = new SubJourney  {
  DbKey = guid,
  Type = SubJourneyTYPE.Transfer
};
var trustFrameworkPolicy = new TrustFrameworkPolicy {
  UserJourneys = new List<UserJourney> {
    new UserJourney {
      Id = "Journey1",
      OrchestrationSteps = new List<OrchestrationStepUserJourney>  {
        new OrchestrationStepUserJourney {
          JourneyList = new List<Candidate> {
            new Candidate {
              SubJourneyReferenceId = "Test",
              SubJourney = sj1
            }
          }
        }
      }
    }
  },

  SubJourneys = new List<SubJourney>  {
    sj2
  }
};
context.Set<TrustFrameworkPolicy>().Add(trustFrameworkPolicy);
context.SaveChanges();

As you can see, the objects sj1 and sj2 are identical. That's how they appear in the XML import. However, from perspective of the database it's the same (I want to treat it as the same, actually).

To get it working I just need to use the same object, like so:

var sj = new SubJourney
{
  Type = SubJourneyTYPE.Transfer
};

If I reference in both positions just this sj EF Ccore treats it as one. However, because the object is created by a serializer (and it contains hundreds of entities, then), this is not feasible.

The Errors

If I enforce the same primary key for both I get this:

System.InvalidOperationException: The instance of entity type 'SubJourney' cannot be tracked because another instance with the key value '{DbKey: d44948dc-d514-4928-abea-3450150c26c4}' is already being tracked. When attaching existing entities, ensure that only one entity instance with a given key value is attached.

I read: The primary key must not be the same to have the entity twice.

If I do not enforce the same primary key I get this:

The INSERT statement conflicted with the FOREIGN KEY constraint "FK_Candidates_SubJourneys_SubJourneyDbKey"

I read: The primary key should be same to fulfil the constraint.

In the debugger the graph shows properly with the object inserted twice:

enter image description here

The Question

How does EF Core recognizes the object as "same"? The two errors are mutually exclusive. What I want is to add the entity twice (enforced by serializer) and still treat the entity as one (enforced by my schema).

What I tried

I read https://learn.microsoft.com/en-us/ef/core/change-tracking/identity-resolution

The text suggests to use ReferenceLoopHandling, but XmlSerializer doesn't know such an option. I tried to serialize the graph as JSON and deserialized the JSON back using the suggested options, but NewtonsoftJson doesn't see this as loop, because the objects reference each other indirectly. Finally, same error.

Setting its primary key doesn't work as shown. Overriding GetHashcode/Equals doesn't work either.

I have also tried to manipulate the ChangeTracker:

  context.ChangeTracker.TrackGraph(trustFrameworkPolicy, node =>
  {
    //Console.WriteLine($"***** Tracking  {node.Entry.Entity.GetType().Name}");
    if (node.Entry.Entity is SubJourney subJourney)
    {
      Console.WriteLine("*** Recognized Subjourney ***");
      var keyValue = node.Entry.Property(nameof(SubJourney.Id)).CurrentValue;
      // Key is another property I know is set and unique (not the PK)
      var entityType = node.Entry.Metadata;

      var existingEntity = node.Entry.Context.ChangeTracker.Entries()
          .FirstOrDefault(
              e => Equals(e.Metadata, entityType)
                   && Equals(e.Property(nameof(SubJourney.Id)).CurrentValue, keyValue));
      if (existingEntity == null)
      {
        node.Entry.State = EntityState.Added;
      } else
      {
        // Just ignore (in debugger I see the state is in fact "Detached")
      }
    } else {
      node.Entry.State = EntityState.Added;
    }
  });    

Still same error (foreign key constraint issue).

Now I run a bit out options. Any pointer how to deal with this would be appreciated.

As a playground I created a simple demo project (console app) with code (and all tries) with SqlLocalDb reference for use with VS 2022:

https://github.com/joergkrause/StackoverflowEFCoreIssue

Thanks for reading through this post :-)

CodePudding user response:

EF works with tracked references. When you have two untracked classes, whether the values are identical or different, they are treated as 2 distinct records. When you associate them to new parent records, EF will attempt to insert them as brand new rows, resulting in either duplicate data (if the PKs are overwritten by Identity columns) or you get exceptions like "A entity with the same Id is already tracked" or unique constraint violations when EF attempts to insert a duplicated row.

When performing operations with imported/transformed data, you need to take care to account for data records that might exist, or at minimum, references that the DbContext may already be tracking. This means that given a set of DTOs you cannot simply map them into a set of Entities and then Add/Update them in the DbContext, especially as even with new top-level entities these will often reference existing records especially in the case of many-to-one relationships.

Take for example I have a list of Orders which contain a Customer reference. I might have 3 orders, two associated with Customer ID 1, one with Customer ID 2. In the serialized data I might get something like:

orders [
{ 
   Number: "10123"
   Customer: 
   {
       Id: 1,
       Name: "Han Solo"
   }
},
{ 
   Number: "10124"
   Customer: 
   {
       Id: 1,
       Name: "Han Solo"
   }
},
{ 
   Number: "10125"
   Customer: 
   {
       Id: 2,
       Name: "Luke Skywalker"
   }
}]

The orders might be expected to be uniquely new though anything unique like Order Number should be verified before inserting, however the Customer might be new, or someone that already exists.

If we use Automapper or such to create Order and Customer entities we would get 3 distinct references for the Customers, even though two of the records reference the same customer. Instead we should be explicit about the entities we actually know we want to insert vs. any relations we should check and use:

foreach(var orderDto in orderDtos)
{
    // Handle situation where a duplicate record might already exist.
    if (_context.Orders.Any(x => x.OrderNumber == orderDto.OrderNumber))
        throw new InvalidOperation("Order already exists");

    var order = Mapper.Map<Order>(orderDto);
    var customer = _context.Customers.SingleOrDefault(x => x.Id == orderDto.Customer.Id);
    if (customer != null)
        order.Customer = customer;

    _context.Orders.Add(order);
}

This assumes that Automapper would create a Customer when mapping an Order. If the Customer is expected to exist, then I would use Single rather than SingleOrDefault and the call would throw if given a Customer ID that doesn't exist. Beyond this you would also want to consider how to scope when work is committed to the DB, whether each order insert is a Unit of Work or the whole batch. Any existing references need to be resolved and overwrite any created entities. The DbContext will check it's local tracking cache first then look to the DB if necessary but it's the best way to guarantee existing records are referenced to avoid duplicate data or exceptions.

CodePudding user response:

As stated in the post I can't control the graph due to the serializer used. However, JSON serializer is more powerful. The links and answers were helpful for further research. I found that a ReferenceResolver shall work it out. In relation to the code in question I got this:

internal class SubJourneyResolver : IReferenceResolver
{

  private readonly IDictionary<string, SubJourney> _sjCache = new Dictionary<string, SubJourney>();

  public void AddReference(object context, string reference, object value)
  {
    if (value is SubJourney sj)
    {
      var id = reference;
      if (!_sjCache.ContainsKey(id))
      {
        _sjCache.Add(id, sj);
      }
    }
  }

  public string GetReference(object context, object value)
  {
    if (value is SubJourney sj)
    {
      _sjCache[sj.Id] = sj;
      return sj.Id;
    }
    return null;
  }

  public bool IsReferenced(object context, object value)
  {
    if (value is SubJourney sj)
    {
      return _sjCache.ContainsKey(sj.Id);
    }
    return false;
  }

  public object ResolveReference(object context, string reference)
  {
    var id = reference;
    _sjCache.TryGetValue(id, out var sj);
    return sj;
  }
}

In the JSON it add $id properties for read objects and replaces the copied object with $ref property. The graph now looks like this:

{
  "$id": null,
  "UserJourneys": {
    "$id": null,
    "$values": [
      {
        "$id": null,
        "Policy": null,
        "OrchestrationSteps": {
          "$id": null,
          "$values": [
            {
              "$id": null,
              "Journey": null,
              "JourneyList": {
                "$id": null,
                "$values": [
                  {
                    "$id": null,
                    "SubJourney": {
                      "$id": "k1",
                      "Policy": null,
                      "Id": "k1",
                      "Type": 0,
                      "DbKey": "00000000-0000-0000-0000-000000000000"
                    },
                    "SubJourneyReferenceId": "Test",
                    "DbKey": "00000000-0000-0000-0000-000000000000"
                  }
                ]
              },
              "Type": 0,
              "DbKey": "00000000-0000-0000-0000-000000000000"
            }
          ]
        },
        "Id": "Journey1",
        "DbKey": "00000000-0000-0000-0000-000000000000"
      }
    ]
  },
  "SubJourneys": {
    "$id": null,
    "$values": [
      {
        "$ref": "k1"
      }
    ]
  },
  "DbKey": "00000000-0000-0000-0000-000000000000"
}

I used another unique property (id) that is independent of the primary key. Now I'm going to deserialize the thing back to .NET object graph. Full code here (Newtonsoft.Json needs to be referenced):

var serialized = JsonConvert.SerializeObject(
    trustFrameworkPolicy,
    new JsonSerializerSettings
    {
      ReferenceLoopHandling = ReferenceLoopHandling.Ignore,
      PreserveReferencesHandling = PreserveReferencesHandling.All,
      TypeNameHandling = TypeNameHandling.Auto,
      ReferenceResolver = new SubJourneyResolver() // solution!
    });

var deserialized = JsonConvert.DeserializeObject<TrustFrameworkPolicy>(serialized, new JsonSerializerSettings
{
  ReferenceLoopHandling = ReferenceLoopHandling.Ignore,
  PreserveReferencesHandling = PreserveReferencesHandling.All,
  TypeNameHandling = TypeNameHandling.Auto
});

The deserialized object is now added to EF context and saved properly.

I still dislike the idea of an additional serialization step for the sole purpose of object handling. For big graphs it's a lot memory consumption. However, I see the advantage of extreme control the JSON serializer provides.

Maybe in the future EF Core provides a similar way to add a "ReferenceResolver" natively to smooth a complex graph.

  • Related