I have one list, read from file:
var lsData = ReadExcelFile<CustomerEntity>(path);
And one Object (loaded into memory):
lsCustomer = await CustomerService.GetAll()
.Where(c => c.isDeleted == null || !c.isDeleted.Value)
.OrderBy(c=> c.Code)
.ToListAsync();
And the join command:
var lsDuplicateEmail =
(from imp in lsData
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
into gjoin
from g in gjoin.DefaultIfEmpty()
select new
{
ImportItem = imp,
CustomerItem = g,
}
into result
where !string.IsNullOrEmpty(result.ImportItem.Email) && result.CustomerItem != null
&& !ImportHelpers.CompareString(result.ImportItem.Code, result.CustomerItem.Code)
select result);
var lsDuplicateEmailInSystem = lsDuplicateEmail.Select(c => c.ImportItem.Code).Distinct().ToList();
I perform test with lsData list about 2000 records, lsCustomer about 200k records.
The Customer Email field is not indexed in the DB.
The join command executes with about 10s (even though the result is 0 records), too slow.
I've looked around and can't seem to index the email field in lsCustomer. I know the reason for the slowness is because the complexity is O(n*m).
Is there any way to improve performance?
CodePudding user response:
Try the following code. Instead of GroupJoin, which is not needed here I have used Join
. Also moved filters up in query.
var lsDuplicateEmail =
from imp in lsData
where !string.IsNullOrEmpty(imp.Email)
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
where !ImportHelpers.CompareString(imp.Code, cust.Code)
select new
{
ImportItem = imp,
CustomerItem = cust,
};
Also show GetPerfectStringWithoutSpace
implementation, maybe it is slow.
Another possible solution is to swap lsData
and lsCustomer
in query, maybe lookup search is not so fast.