Difference between Any() and FirstOrDefault() !=null when querying database with EF Core [closed]-CodePudding

I am new to ASP.NET Core and Entity Framework Core.

In my application I am implementing sign up system which accepts json as the user input.

In my case I want to prevent from duplication in database. For example I want to prevent email duplication for the user in database before inserting new row into my database. So first I want to check whether we have saved a user with some specific email in our database or not.

I considered 2 choices: using .Any() and .SingleOrDeafualt() != null.

From a performance point of view, which one is better and faster?

CodePudding user response：

(In EF5) a C# of

context.User.Any(x => x.Email == y)

would be translated to something like

SELECT CASE WHEN EXISTS(SELECT null FROM User WHERE Email = @y) THEN 1 ELSE 0 END

And

context.User.FirstOrDefault(x => x.Email == y)

would be

SELECT TOP 1 * FROM User WHERE Email = @y

While

context.User.SingleOrDefault(x => x.Email == y)

would be

SELECT TOP 2 * FROM User WHERE Email = @y

As such I'd expect the overall performance to run from best to worst in order that they're listed, chiefly because the first query pulls no irrelevant table data at all and can be answered entriely from the index you'll put on Email

The second pulls 1 and the third pulls up to two (so it can throw an exception if there are two because you asked for single) rows of data from disk after using the index to locate them.

The actually querying to know if there is data will probably be a drop in the ocean of the overall time taken; the bigger hit will be retrieving data from disk and sending it over a network. Sure, you could pare down your XOrDefault forms so they only pull some dummy constant rather than a full object's data, but then that's even more mess in your code

Looping round to my comment though; (especially)if you expect duplicates to come up less often, just "making the columns unique and trying an insert/let it explode if the values have been seen before", would be the best way as the DB will enforce the integrity and is the central authority. There is often a possibility that your client side will simultaneously invoke the same check on the same ID etc, both threads will conclude that the data is not in the db and both will insert it, so your "preventing duplicates" logic might not actually work..

Sure you could look to lock stuff and prevent the race conditions but honestly, that's a lot of performance crippling wheel reinventing to come up with something that isn't as good as what the DB can do already..

With a unique index you hand the problem of preventing dupes off to the database and it's very good at it

CodePudding user response：

If you're just querying off the main object type (e.g. Users.Any(u => u.Email == email)), Any is slightly better because it only needs to return a boolean value from the database rather than the entire user.

If the check you're talking about is in a subquery (Users.Where(u => u.Emails.Any(e => e.Address == email))), Entity Framework Core will likely throw an exception if you try to swap out Any for FirstOrDefault(...) == null.

Probably most importantly, Any() is simpler and more expressive of what you're trying to check. So go with that.