Home > Enterprise >  Linq take distinct
Linq take distinct

Time:09-17

I am trying to implement a Linq to SQL operator (EF core)

q.TakeDistinct(n, src => src.Field)

So that it returns the first n elements from q only counting distinct Field

Basically I want a Take that will not count elements which do not differ in Field

For example:

Job Status
10   start
10   progress
10   alert
10   done
12   start
12   error
12   done
32   start
32   info
32   done

then

ctx.Table.TakeDistinct(2, v => v.Job)

will return:

10   start
10   progress
10   alert
10   done
12   start
12   error
12   done

I tried various combinations of Take and Distinct but no luck .. any leads appreciated.

Note I am looking for something that will translate to a reasonable efficient SQL by Linq-to-SQL (not in memory processing)

EDIT:

This works:

ctx.Table.Where(z => 
  Table.Select(x => x.Field).Distinct().Take(n)
  .Contains(z.Field)
);

If is however really slow (in LinqPad) .. is there a better way ?

CodePudding user response:

In SQL, I'd be looking at a WHERE IN subquery.

SELECT JobId, Status
FROM Jobs
WHERE Status In (SELECT DISTINCT TOP 2 Status FROM Jobs)

Not a lot of experience with LinqToSQL but this might work directly:

var TakeJobs = db.Jobs.Select(j => j.JobId).Distinct().Take(2);
var entries = db.Jobs.Where(j => TakeJobs.Contains(j.JobId));

Apparently this might be cheaper with a join to the subquery. I'm not sure how that looks in Linq.

SELECT JobId, Status FROM Jobs j
RIGHT JOIN (SELECT DISTINCT TOP 2 JobId FROM Jobs) t ON t.JobId = j.JobId;

CodePudding user response:

I understand your idea. I wrote this extensions method for implement. But it need to optimize because abuse LinQ may impact performance.

public static IEnumerable<T> TakeDistinct<T, TProperty>(this IEnumerable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
    {
        var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x=>x.Key).Take(take).Select(x=>x.Key).ToList();
        var mappingFunc = distinctBy.Compile();
        return source.Where(x => orderList.Contains(mappingFunc.Invoke(x)));
    }
static void Main()
    {

        var jobLines = @"10   start
10   progress
10   alert
10   done
12   start
12   error
12   done
32   start
32   info
32   done";

        var jobs = jobLines.Split('\n').Select(line =>
        {
            var splited = line.Split(new[] { "   " }, StringSplitOptions.RemoveEmptyEntries);
            return new JobItem
            {
                Job = Convert.ToInt32(splited[0]),
                Status = splited[1]
            };
        });

        var newJobs = jobs.TakeDistinct(2, x => x.Job);
        foreach (var job in newJobs)
        {
            Console.WriteLine($"{job.Job} - {job.Status}");
        }
        Console.ReadLine();
    }

    class JobItem
    {
        public int Job { get; set; }
        public string Status { get; set; }
    }

    public static IEnumerable<T> TakeDistinct<T, TProperty>(this IEnumerable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
    {
        var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x=>x.Key).Take(take).Select(x=>x.Key).ToList();
        var mappingFunc = distinctBy.Compile();
        return source.Where(x => orderList.Contains(mappingFunc.Invoke(x)));
    }

Result

--- UPDATED----

This update is to support LinQ to SQL

public static IQueryable<T> TakeDistinct<T, TProperty>(this IQueryable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
{
    var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x => x.Key).Take(take).SelectMany(x => x);
    return orderList;
}

--- UPDATED ----

I add generated SQL from statement above. I've created fake table name JobItems with Columns: Id, Job and Status

SELECT 
    [Extent2].[Id] AS [Id], 
    [Extent2].[Job] AS [Job], 
    [Extent2].[Status] AS [Status]
    FROM   (SELECT TOP (2) [Distinct1].[Job] AS [Job]
        FROM ( SELECT DISTINCT 
            [Extent1].[Job] AS [Job]
            FROM [dbo].[JobItems] AS [Extent1]
        )  AS [Distinct1]
        ORDER BY [Distinct1].[Job] ASC ) AS [Limit1]
    INNER JOIN [dbo].[JobItems] AS [Extent2] ON [Limit1].[Job] = [Extent2].[Job]

CodePudding user response:

The following extension generates expression tree which makes Distinct().Take(x) and generates join to original query. It accepts also x => new { x.Id, x.Id2 } argument as distinct key.

public static class QueryableExtensions
{
    public static IQueryable<T> TakeDistinct<T, TKey>(this IQueryable<T> source, int take,
        Expression<Func<T, TKey>> distinctBy)
    {
        var distinctQuery = source.Select(distinctBy).Distinct().Take(take);

        var distinctParam = Expression.Parameter(typeof(TKey), "d");
        var entityParam   = distinctBy.Parameters[0];

        var mapping = MapMembers(distinctBy.Body, distinctParam).ToList();

        var whereExpr = mapping.Select(t => Expression.Equal(t.Item1, t.Item2))
            .Aggregate(Expression.AndAlso);
        var whereLambda = Expression.Lambda(whereExpr, entityParam);

        var selectManySelector =
            Expression.Lambda(
                Expression.Convert(
                    Expression.Call(typeof(Queryable), nameof(Queryable.Where), new[] { typeof(T) },
                        source.Expression,
                        whereLambda),
                    typeof(IEnumerable<T>)
                ),
                distinctParam
            );

        var selectManyQuery = Expression.Call(typeof(Queryable), nameof(Queryable.SelectMany),
            new[] { typeof(TKey), typeof(T) }, distinctQuery.Expression, selectManySelector);


        return source.Provider.CreateQuery<T>(selectManyQuery);
    }

    private static IEnumerable<Tuple<Expression, Expression>> MapMembers(Expression expr, Expression projectionPath)
    {
        switch (expr.NodeType)
        {
            case ExpressionType.New:
            {
                var ne = (NewExpression)expr;

                for (int i = 0; i < ne.Arguments.Count; i  )
                {
                    foreach (var e in MapMembers(ne.Arguments[i], Expression.MakeMemberAccess(projectionPath, ne.Members[i])))
                    {
                        yield return e;
                    }
                }
                break;
            }

            default:
                yield return Tuple.Create(projectionPath, expr);
                break;
        }
    }
}

  • Related