Home > Back-end >  LINQ Distinct on a particular property and latest
LINQ Distinct on a particular property and latest

Time:07-11

Suppose I have the following collection

public class User
{
    public string SSN { get; set; }
    public DateTime StartDate { get; set; }
}

var users = new List<User>
{
    new User {  SSN = "ab", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ab", StartDate = new DateTime(2021, 01, 02) }, // take this

    new User {  SSN = "ac", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ac", StartDate = new DateTime(2021, 02, 01) }, // take this

    new User {  SSN = "ad", StartDate = new DateTime(2020, 01, 01) },
    new User {  SSN = "ad", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ad", StartDate = new DateTime(2022, 01, 01) }, // take this
};

What I am trying to do is to get SSN distinct but by only latest StartDate and I created two queries which seems to work. There is a better way in term of perfomance?

// shows only latest is selected
var district = users
    .OrderByDescending(p => p.StartDate)
    .GroupBy(g => g.SSN)
    .Select(x => x.First())
    .ToList();

var ssn = users
    .OrderByDescending(p => p.StartDate)
    .GroupBy(g => g.SSN)
    .Select(x => x.First())
    .Select(x=> x.SSN)
    .ToList();

CodePudding user response:

Since you mentioned that you also need the latest StartDate in the comment,

Group by SSN and get the latest StartDate via .Max().

var result = users
    .GroupBy(g => g.SSN)
    .Select(x => new User
    {
        SSN = x.Key,
        StartDate = x.Max(y => y.StartDate)
    })
    .ToList();

Sample .NET Fiddle

CodePudding user response:

For better performance you should avoid OrderBy() like the accepted answer does.

But you can also avoid creating a new User instance with MaxBy():

var latest = users
    .GroupBy(u => u.SSN)
    .Select(us => us.MaxBy(y => y.StartDate));

CodePudding user response:

What I am trying to do is to get SSN distinct but by only latest StartDate:

You just need to do the operation in the good order.
First group by SSN.
Then get the latest element (by StartDate) in each group:

var result = users.GroupBy(u => u.SSN)                     // distinct
                  .Select(g => g.MaxBy(u => u.StartDate)); // latest

CodePudding user response:

If you group by SSN, and then select the same SSN, the values of StartDate are completely irrelevant.

Hence, the list of distinct SSNs can be obtained by selecting the Key of each grouping, like this:

var ssns = users.GroupBy(u => u.SSN).Select(g => g.Key);

Update: As per your further comments, if you need the whole user object, which has the maximum StartDate, you can do something like this:

var users = users
    .GroupBy(u => u.SSN)
    .Select(g => g.OrderByDescending(u => u.StartDate).First());

For another solution, see also Yong Shun's answer.

  • Related