I have a below linq query and getting data like below example want to remove duplications
List<EmployeeSalary> lstEmployeeSalary =
new EmployeeSalaryFactory().GetRelatedObjects(inValue, ddlPayDate, payRollType, payrollSearch)
.Select(m => (EmployeeSalary)m)
.ToList();
For ex.:
Id Name EmpCode Salary DateOfSalary
-------------------------------------------------------------
1 Item1 IT00001 $100 5/26/2021
2 Item2 IT00002 $200 4/26/2021
3 Item3 IT00003 $150 5/26/2021
1 Item1 IT00001 $100 4/26/2021
3 Item3 IT00003 $150 4/26/2021
Output
Id Name EmpCode Salary DateOfSalary
-------------------------------------------------------------
1 Item1 IT00001 $100 5/26/2021
2 Item2 IT00002 $200 4/26/2021
3 Item3 IT00003 $150 5/26/2021
CodePudding user response:
If suppose that new EmployeeSalaryFactory().GetRelatedObjects(...)
returns list of EmployeeSalary
objects:
List<EmployeeSalary> lstEmployeeSalary =
new EmployeeSalaryFactory().GetRelatedObjects(...)
.GroupBy(x => x.Id)
.Select(g => g.OrderByDescending(o => o.DateOfSalary).First());
CodePudding user response:
First of all, don't do the ToList()
inside your procedures, unless you will be using the fact that the result is a List<EmployeeSalary>
.
If you only intend to return the fetched data to your caller, consider to return IEnumerable<EmployeeSalary>
and let the caller do the ToList.
The reason for this, is that if you caller doesn't want to use all fetched data, it would be a waste of processing power to materialize it all:
Suppose you have the following methods to get the EmployeeSalaries:
private EmployeeSalaryFactory {get;} = new EmployeeSalaryFactory();
IEnumerable<EmployeeSalary> GetEmployeeSalaries()
{
return this.EmployeeSalaryFactory
.GetRelatedObjects(inValue, ddlPayDate, payRollType, payrollSearch)
.Select(m => (EmployeeSalary)m);
}
It might be that inValue, ddlPayDate, etc are parameters of this method, but that's outside the question.
Now let's use this method:
EmployeeSalary GetSalary(int employeeId)
{
return this.GetEmployeeSalaries()
.Where(salary => salary.EmployeeId == employeeId)
.FirstOrDefault();
}
If GetEmployeeSalaries would have returned a List<EmployeeSalary>
then all salaries would have been materialized, while the caller might only needed a few.
Back to your question
I want to remove duplications
The answer depends on what you would call a duplicate: When are two EmployeeSalaries equal? Is that if all properties have equal value, or are two salaries equal if they have the same Id (but possibly different Salary).
I assume the first: all values should be checked for equality
The quick solution
If you only need to do this for this usage only, if you don't need to massively unit test it, don't need to prepare for future changes, don't want to be able to reuse the code for similar problems, consider to use Queryable.Distinct before your Select.
The result of
Of course, if the data is in your local process (not in a database), you can use the IEnumerable equivalent.
var uniqueSalaries = this.EmployeeSalaryFactory
.GetRelatedObjects(inValue, ddlPayDate, payRollType, payrollSearch)
.Select(salary => new
{
// Select all properties that you need to make a Salary:
Id = salary.Id,
Name = salary.EmpCode,
Salary = salary.Salary,
Date = salary.DateOfSalary,
})
.Distinct()
Before the Distinct, the selected objects are of anonymous type. They have a default equality comparer that compares by value, not by reference. So two objects of this anonymous type that have equal value for every property are considered to be equal. Distinct will remove duplicates.
If you really need that the result is IEnumerable<EmployeeSalary>
, you'll need a second select:
.Select(uniqueSalary => new EmployeeSalary
{
Id = uniqueSalary.Id,
Name = uniqueSalary.Name,
...
});
Proper solution
If the input data is in your local process (= it is IEnumerable), you have more LINQ methods at your disposal, like the overload of Enumerable.Distinct that has a parameter EqualityComparer.
In that case, my advice would be to create an Equality comparer for EmployeeSalaries. This will have the advantage that you can reuse the equality comparer for other EmployeeSalary problems. The code will look easier to read. You are prepared for future changes: if you add or remove a property from your definition of equality, for instance if you only need to check the Id, there is only one place that you have to change. You can unit test the comparer: didn't you forget some properties?
private EmployeeSalaryFactory {get;} = new EmployeeSalaryFactory();
private IEqualityComparer<EmployeeSalary> SalaryComparer {get} = ...;
private IEnumerable<EmployeeSalary> GetEmployeeSalaries() { ... see above }
To get the unique salaries:
IEnumerable<EmployeeSalary> uniqueSalaries = this.GetEmployeeSalaries()
.Distinct(this.SalaryComparer);
Did you notice, that because I reuse a lot of code, the specific problem of unique salaries is quite easy to understand.
I cheated a little, I moved the problem to the equality comparer.
IEquality
Creating a reusable equality comparer is fairly straightforward. The advantage is that you can reuse it in all cases where you need to compare EmployeeSalaries. If in future your definition of equality changes, there is only one place that you need to change. Finally: only one place where you need to unit test whether you implemented the proper definition of equality.
public class EmployeeSalaryComparer : EqualityComparer<EmployeeSalary>()
{
public static IEqualityComparer<EmployeeSalary> ByValue {get} = new EmployeeSalaryComparer;
public override bool Equals (EmployeeSalary x, EmployeeSalary y) {...}
public override int GetHashCode (EmployeeSalary x) {...}
}
Usage would be:
IEqualityComparer<EmployeeSalary> salaryComparer = EmployeeSalaryComparer.ByValue;
EmployeeSalary employee1 = ...
EmployeeSalary employee2 = ...
bool equal = salaryComparer.Equals(employee1, employee2);
Implement equality
public override bool Equals (EmployeeSalary x, EmployeeSalary y)
{
Almost all equality comparers start with the following lines:
if (x == null) return y == null; // true if both null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y) return true; // same object
if (x.GetType() != y.GetType() return false;
After this, the real comparing for equality starts. The implementation depends on what you call equality. You might say: same Id is equal EmployeeSalary. Our aproach is to check all fields, for instance to see if we need to update the database, because some values are changed:
return x.Id == y.Id
&& x.Name == y.Name
&& x.EmpCode == y.EmpCode
&& x.Salary == y.Salary
&& x.DateOfSalary == y.DateOfSalary;
}
Are in your definition the names: "John Doe" and "john doe" equal? And when are EmpCodes equal?
If you think they are not default, or might change in future, consider to add properties to the EmployeeSalaryComparer:
private static IEqualityComparer<string> NameComparer {get} = StringComparer.InvariantCultureIgnoreCase;
private static IEqualityComparer<string> EmpCodeComparer {get} = StringComparer.OrdinalIgnoreCase;
...
The check for equality will end like:
return IdComparer.Equals(x.Id, y.Id)
&& NameComparer.Equals(x.Name, y.Name)
&& EmpCodeComparer.Equals(x.EmpCode, y.EmpCode)
&& SalaryComparer.Equals(x.Salary, y.Salary)
&& DateComparer.Equals(x.DateOfSalary, y.DateOfSalary);
If company policy about names in future changes, then all you have to do is select a different name comparer. And if EmpCode "Boss" is the same as EmpCode "boss": only one place to change the code.
Of course, after spec changes you need to change your unit tests, so they will tell you automatically where you forgot to change the proper equality comparers.
GetHashCode
GetHashCode is used to quickly check for inequality. Keywords: quickly, and inequality. If two Hash codes are different, we know that the object are not equal. It is not the other way round: if two hash codes are equal, we don't know whether the objects are equal.
The hash code is meant to quickly throw away most unequal objects. For instance, in a Distinct method, it would be nice if you could quickly throw away 99% of the objects, so you only have to thoroughly check 1% of the objects for equality.
With EmployeeSalaries we know that if the Id is different, than the Salaries are not equal. It will seldom be that two EmployeeSalaries will have the same Id, but different EmpCode. So by checking the Id only, we throw away most unequal EmployeeSalaries.
How about this:
public override int GetHashCode (EmployeeSalary x)
{
if (x == null) return 9875578; // just a number for null salaries
return x.Id.GetHashCode();
}
Conclusion
- We've discussed why it is better to return IEnumerable instead of ToList.
- We've talked about methods to make your code reusable, easier to read, maintainable, easier to unit test
- We've talked about equality comparers
- We've used Distinct to solver you problem