I am trying to implement a Linq to SQL operator (EF core)
q.TakeDistinct(n, src => src.Field)
So that it returns the first n elements from q only counting distinct Field
Basically I want a Take
that will not count elements which do not differ in Field
For example:
Job Status
10 start
10 progress
10 alert
10 done
12 start
12 error
12 done
32 start
32 info
32 done
then
ctx.Table.TakeDistinct(2, v => v.Job)
will return:
10 start
10 progress
10 alert
10 done
12 start
12 error
12 done
I tried various combinations of Take
and Distinct
but no luck .. any leads appreciated.
Note I am looking for something that will translate to a reasonable efficient SQL by Linq-to-SQL (not in memory processing)
EDIT:
This works:
ctx.Table.Where(z =>
Table.Select(x => x.Field).Distinct().Take(n)
.Contains(z.Field)
);
If is however really slow (in LinqPad) .. is there a better way ?
CodePudding user response:
In SQL, I'd be looking at a WHERE IN subquery.
SELECT JobId, Status
FROM Jobs
WHERE Status In (SELECT DISTINCT TOP 2 Status FROM Jobs)
Not a lot of experience with LinqToSQL but this might work directly:
var TakeJobs = db.Jobs.Select(j => j.JobId).Distinct().Take(2);
var entries = db.Jobs.Where(j => TakeJobs.Contains(j.JobId));
Apparently this might be cheaper with a join to the subquery. I'm not sure how that looks in Linq.
SELECT JobId, Status FROM Jobs j
RIGHT JOIN (SELECT DISTINCT TOP 2 JobId FROM Jobs) t ON t.JobId = j.JobId;
CodePudding user response:
I understand your idea. I wrote this extensions method for implement. But it need to optimize because abuse LinQ may impact performance.
public static IEnumerable<T> TakeDistinct<T, TProperty>(this IEnumerable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
{
var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x=>x.Key).Take(take).Select(x=>x.Key).ToList();
var mappingFunc = distinctBy.Compile();
return source.Where(x => orderList.Contains(mappingFunc.Invoke(x)));
}
static void Main()
{
var jobLines = @"10 start
10 progress
10 alert
10 done
12 start
12 error
12 done
32 start
32 info
32 done";
var jobs = jobLines.Split('\n').Select(line =>
{
var splited = line.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
return new JobItem
{
Job = Convert.ToInt32(splited[0]),
Status = splited[1]
};
});
var newJobs = jobs.TakeDistinct(2, x => x.Job);
foreach (var job in newJobs)
{
Console.WriteLine($"{job.Job} - {job.Status}");
}
Console.ReadLine();
}
class JobItem
{
public int Job { get; set; }
public string Status { get; set; }
}
public static IEnumerable<T> TakeDistinct<T, TProperty>(this IEnumerable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
{
var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x=>x.Key).Take(take).Select(x=>x.Key).ToList();
var mappingFunc = distinctBy.Compile();
return source.Where(x => orderList.Contains(mappingFunc.Invoke(x)));
}
--- UPDATED----
This update is to support LinQ to SQL
public static IQueryable<T> TakeDistinct<T, TProperty>(this IQueryable<T> source, int take, Expression<Func<T, TProperty>> distinctBy)
{
var orderList = source.AsQueryable().GroupBy(distinctBy).OrderBy(x => x.Key).Take(take).SelectMany(x => x);
return orderList;
}
--- UPDATED ----
I add generated SQL from statement above.
I've created fake table name JobItems
with Columns: Id
, Job
and Status
SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Job] AS [Job],
[Extent2].[Status] AS [Status]
FROM (SELECT TOP (2) [Distinct1].[Job] AS [Job]
FROM ( SELECT DISTINCT
[Extent1].[Job] AS [Job]
FROM [dbo].[JobItems] AS [Extent1]
) AS [Distinct1]
ORDER BY [Distinct1].[Job] ASC ) AS [Limit1]
INNER JOIN [dbo].[JobItems] AS [Extent2] ON [Limit1].[Job] = [Extent2].[Job]
CodePudding user response:
The following extension generates expression tree which makes Distinct().Take(x)
and generates join to original query. It accepts also x => new { x.Id, x.Id2 }
argument as distinct key.
public static class QueryableExtensions
{
public static IQueryable<T> TakeDistinct<T, TKey>(this IQueryable<T> source, int take,
Expression<Func<T, TKey>> distinctBy)
{
var distinctQuery = source.Select(distinctBy).Distinct().Take(take);
var distinctParam = Expression.Parameter(typeof(TKey), "d");
var entityParam = distinctBy.Parameters[0];
var mapping = MapMembers(distinctBy.Body, distinctParam).ToList();
var whereExpr = mapping.Select(t => Expression.Equal(t.Item1, t.Item2))
.Aggregate(Expression.AndAlso);
var whereLambda = Expression.Lambda(whereExpr, entityParam);
var selectManySelector =
Expression.Lambda(
Expression.Convert(
Expression.Call(typeof(Queryable), nameof(Queryable.Where), new[] { typeof(T) },
source.Expression,
whereLambda),
typeof(IEnumerable<T>)
),
distinctParam
);
var selectManyQuery = Expression.Call(typeof(Queryable), nameof(Queryable.SelectMany),
new[] { typeof(TKey), typeof(T) }, distinctQuery.Expression, selectManySelector);
return source.Provider.CreateQuery<T>(selectManyQuery);
}
private static IEnumerable<Tuple<Expression, Expression>> MapMembers(Expression expr, Expression projectionPath)
{
switch (expr.NodeType)
{
case ExpressionType.New:
{
var ne = (NewExpression)expr;
for (int i = 0; i < ne.Arguments.Count; i )
{
foreach (var e in MapMembers(ne.Arguments[i], Expression.MakeMemberAccess(projectionPath, ne.Members[i])))
{
yield return e;
}
}
break;
}
default:
yield return Tuple.Create(projectionPath, expr);
break;
}
}
}