I have a requirement to find records that does not contain value from studentids list. There is no 1:1 matching field. The records contain 3 key fields and based on condition, I have to compare against the studentids list.
For example: If rawdata record has non empty studentnumber value, then I have to concatenate studentnumber and sequencenumber and check if that combination exists in studentids list. Otherwise, I have to concatenate EnrollNumber and SequenceNumber.
I am trying to use ternary operator inside a contains as shown below,
var studentIDs = students.Select(x => x.StudentID).Distinct().ToList();
var rawData = StudentManager.GetAllData();
var resultList = rawData.Where(x => !studentIDs.Contains($"{(!string.IsNullOrWhiteSpace(x.StudentNumber)? (x.StudentNumber x.SequenceNumber):(x.EnrollNumber x.SequenceNumber))}")).ToList();
However, for larger dataset (more than 5K), it seems getting slower. Any suggestion for alternate method or way to improve will be greatly appreciated. Especially, if the code (Ternary operator) inside Contains can be simplified.
CodePudding user response:
As @derpischer mention on the commend, Did you try with a HashSet?
Replace the firs line for the following:
var studentIDs = new HashSet<string>(students.Select(x => x.StudentID));
This will speed up your execution times. Please let me know if it works.
CodePudding user response:
I think your approach to the logic is fine. I think you can present it in a clearer and easier way. Consider the below:
HashSet<string> studentIDs = students.Select(s => s.StudentID)
.ToHashSet();
string StudentID(RawDataStudent s) => string.IsNullOrWhiteSpace(s.StudentNumber)
? $"{s.EnrollNumber}{s.SequenceNumber}"
: $"{s.StudentNumber}{s.SequenceNumber}";
var rawData = StudentManager.GetAllData();
var resultList = rawData.Where(s => !studentIDs.Contains(StudentID(s)))
.ToList();
Important points:
- Pulling the entire 'contains' lambda out and presenting it as a clearly named function with intent - good readability
- Always try and work in the affirmative with booleans - specifically you ended up with this weird bracketed negation with the null or whitespace, just switch the returns around - again also easier to read
- As other posters have commented, calling contains against a HashSet will be considerably faster
- Note I've assumed your GetAllData return type somewhat - good example of when var is a bit evil