Suppose I have the following collection
public class User
{
public string SSN { get; set; }
public DateTime StartDate { get; set; }
}
var users = new List<User>
{
new User { SSN = "ab", StartDate = new DateTime(2021, 01, 01) },
new User { SSN = "ab", StartDate = new DateTime(2021, 01, 02) }, // take this
new User { SSN = "ac", StartDate = new DateTime(2021, 01, 01) },
new User { SSN = "ac", StartDate = new DateTime(2021, 02, 01) }, // take this
new User { SSN = "ad", StartDate = new DateTime(2020, 01, 01) },
new User { SSN = "ad", StartDate = new DateTime(2021, 01, 01) },
new User { SSN = "ad", StartDate = new DateTime(2022, 01, 01) }, // take this
};
What I am trying to do is to get SSN distinct but by only latest StartDate and I created two queries which seems to work. There is a better way in term of perfomance?
// shows only latest is selected
var district = users
.OrderByDescending(p => p.StartDate)
.GroupBy(g => g.SSN)
.Select(x => x.First())
.ToList();
var ssn = users
.OrderByDescending(p => p.StartDate)
.GroupBy(g => g.SSN)
.Select(x => x.First())
.Select(x=> x.SSN)
.ToList();
CodePudding user response:
Since you mentioned that you also need the latest StartDate
in the comment,
Group by SSN
and get the latest StartDate
via .Max()
.
var result = users
.GroupBy(g => g.SSN)
.Select(x => new User
{
SSN = x.Key,
StartDate = x.Max(y => y.StartDate)
})
.ToList();
CodePudding user response:
For better performance you should avoid OrderBy()
like the accepted answer does.
But you can also avoid creating a new User
instance with MaxBy()
:
var latest = users
.GroupBy(u => u.SSN)
.Select(us => us.MaxBy(y => y.StartDate));
CodePudding user response:
What I am trying to do is to get
SSN
distinct but by only latestStartDate
:
You just need to do the operation in the good order.
First group by SSN
.
Then get the latest element (by StartDate
) in each group:
var result = users.GroupBy(u => u.SSN) // distinct
.Select(g => g.MaxBy(u => u.StartDate)); // latest
CodePudding user response:
If you group by SSN
, and then select the same SSN
, the values of StartDate
are completely irrelevant.
Hence, the list of distinct SSNs can be obtained by selecting the Key
of each grouping, like this:
var ssns = users.GroupBy(u => u.SSN).Select(g => g.Key);
Update: As per your further comments, if you need the whole user object, which has the maximum StartDate
, you can do something like this:
var users = users
.GroupBy(u => u.SSN)
.Select(g => g.OrderByDescending(u => u.StartDate).First());
For another solution, see also Yong Shun's answer.