How to improve performance when joining List and Linq object-CodePudding

I have one list, read from file:

var lsData = ReadExcelFile<CustomerEntity>(path);

And one Object (loaded into memory):

lsCustomer = await CustomerService.GetAll()
    .Where(c => c.isDeleted == null || !c.isDeleted.Value)
    .OrderBy(c=> c.Code)
    .ToListAsync();

And the join command:

var lsDuplicateEmail = 
    (from imp in lsData
    join cust in lsCustomer
    on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
    into gjoin
    from g in gjoin.DefaultIfEmpty()
    select new
        {
            ImportItem = imp,
            CustomerItem = g,
        }
    into result
    where !string.IsNullOrEmpty(result.ImportItem.Email) && result.CustomerItem != null 
        && !ImportHelpers.CompareString(result.ImportItem.Code, result.CustomerItem.Code)
    select result);

var lsDuplicateEmailInSystem = lsDuplicateEmail.Select(c => c.ImportItem.Code).Distinct().ToList();

I perform test with lsData list about 2000 records, lsCustomer about 200k records.

The Customer Email field is not indexed in the DB.

The join command executes with about 10s (even though the result is 0 records), too slow.

I've looked around and can't seem to index the email field in lsCustomer. I know the reason for the slowness is because the complexity is O(n*m).

Is there any way to improve performance?

CodePudding user response：

Try the following code. Instead of GroupJoin, which is not needed here I have used Join. Also moved filters up in query.

var lsDuplicateEmail = 
    from imp in lsData
    where !string.IsNullOrEmpty(imp.Email)
    join cust in lsCustomer
       on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
    where !ImportHelpers.CompareString(imp.Code, cust.Code)
    select new
    {
        ImportItem = imp,
        CustomerItem = cust,
    };

Also show GetPerfectStringWithoutSpace implementation, maybe it is slow.

Another possible solution is to swap lsData and lsCustomer in query, maybe lookup search is not so fast.