There are other questions similar to mine but they didn't help me. I'm performing what should be a simple Linq group by operation, and in SQL Server Management Studio and Linqpad I get 23,859 results from a table containing 36,102 total records. This is what I believe to be the correct result.
For some reason, when I move my query into my Visual Studio application code, I get 22,463 groups - and I cannot for the life of me figure out why.
I need to group this table's rows based on unique combinations of 8 columns. The columns contain account IDs, person IDs, device IDs, premise IDs, and address columns. Basically, a person can have multiple accounts, multiple premises, multiple devices, and each premise can have it's own address. I know the table design is lacking... it's customer provided and there are other columns that necessitate the format - it should not be relevant to the grouping though.
SQL Server: 23859 groups:
SELECT acct_id, per_id, dev_id, prem_id, address, city, state, postal
FROM z_AccountInfo GROUP BY acct_id, per_id, dev_id, prem_id, address, city, state, postal
ORDER BY per_id
Linqpad: 23859 groups:
//Get all rows...
List<z_AccountInfo> zAccounts = z_AccountInfo.ToList();
//Group them...
var zAccountGroups = (from za in zAccounts
group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zaGroups
select zaGroups).OrderBy(zag => zag.Key.per_id).ToList();
Visual Studio: 22463 groups - WRONG?:
//Intantiate list I can use outside of Entity Framework context...
List<z_AccountInfo> zAccounts = new List<z_AccountInfo>();
using (Entities db = Entities.CreateEntitiesForSpecificDatabaseName(implementation))
{
//Get all rows. Count verified to be correct...
zAccounts = db.z_AccountInfo.OrderBy(z => z.per_id).ToList();
}
// Group the rows. Doesn't work??? 22463 groups?
var zAccountGroups = (from z_AccountInfo za in zAccounts
group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zag
select zag).ToList();
I'm hoping someone can spot a syntax issue or something else I'm missing. Seems like Visual Studio is grouping something.. but it's off by 1396 groups... that's pretty significant.
UPDATE: sgmoore's comment below put me on the track of making sure the zAccounts list from Linqpad and Visual Studio match. They do not!?! Querying the table in SQL Server shows this data (account / device / premise)
Inspecting the Visual Studio output in Beyond Compare shows the device ID 6106471 being erroneously repeated / duplicated for the 4 bottom rows... meaning there should be 2 groups here, but my query will only see 1...
Since I'm using Entity Framework to query the data in the table in Visual Studio, this makes me think something is wrong with my model but I have no idea what it could be. Beyond compare shows this same issue happening multiple times and explains why the group numbers are off. It's like EF knows there are 8 rows (in this case) - but the field that differentiates them doesn't come through.
I tried truncating the table and re-adding all of the data into it and re-running and the bad behavior persists. Quite confused here - I've never had this kind of issue with Entity Framework before.
I even ran SQL Profiler when VS was executing and trapped the query Entity Framework is firing to populate zAccounts. That query when fired by itself in SQL Server correctly shows the four 7066550 rows. This seems to be squarely on Entity Framework and the ToList() call that populates the full collection - ideas anyone?
CodePudding user response:
Short answer - make sure the table in the Entity Framework model has an Entity Key on a column where the values of the column are unique.
Longer answer - to troubleshoot I ran SQL Profiler to ensure that the query EF was sending to SQL Server was correct - and it was. I ran that query and inspected the results to see the data I was wanting. The problem was my model. I had an Entity Key set on a field that did not contain unique values. My guess is that EF assumes that since the field is set as the Entity Key, the values must be unique. Based on that it somehow indexes or caches the first row where the "id" is and then projects that row's values into query results. That is a bad assumption in my view if there is not a validation check of the field marked as the Entity Key. I realize I'm to blame here for telling it to use a non-unique field as the Entity Key - but I don't see the case where this would be a good idea without it throwing at least a warning.
Anyway, to resolve, I added a proper id column to the table and set it's Identiy spec and auto-increment so that any rows in the table would have a unique id. After that, I updated my edmx to use my new column as the Entity Key and re-ran my code and then everything magically started working.