Home > Net >  Fastest way to search huge records using Linq query in AngularJs & C#
Fastest way to search huge records using Linq query in AngularJs & C#

Time:02-23

I have to Perform Global Search on table means if user enters any keyword or multiple keywords and clicks on search button then based on the entered keywords it should bring all the combination records.

We have to search those 2 keywords in every column of a table (Like clause in SQL with OR operator for multiple keywords) and query should fetch the data.

I have around 200k of records in the database. enter image description here

First calling function to load the data

if ((Role)user.Role == Role.InternalAdministrator || (Role)user.Role == 
Role.InternalStaff)
{
listJobs = (
from d in db.Jobs
where d.TimeCreated.Value.Year >= 2020
select new JobModel()
{
AlternatePickupDelivery = d.AlternatePickupDelivery,
Branch = (
from b in db.Branches
where b.BranchId == d.ProcessingCity
select b.Branch1
).FirstOrDefault(),
ClientName = d.ClientName,
ClientId = d.ClientId,
ContactName = d.ContactName,
MatterReference = d.MatterReference,
JMSNumber = d.JmsNumber,
JobDescription = d.JobDescription,
JobId = d.JobId,
JobShortDescription = d.JobShortDescription,
OrderType = d.OrderType,
OrderTypeDisplay = (
from jt in db.JobTypes
where jt.Id == d.OrderType
select jt.JobTypeName
).FirstOrDefault(),
ProcessingCity = d.ProcessingCity ?? 0,
DisplayProcessingCity = (
from jt in db.ProcessingCities
where jt.ProcessingCityId == d.ProcessingCity
select jt.ProcessingCity1
).FirstOrDefault(),
Status = d.Status,
DisplayStatus = (
from jt in db.JobStatuses
where jt.Id == d.Status
select jt.JobStatusName
).FirstOrDefault(),
StatusDisplayOrder = (from js in db.JobStatuses
where js.Id == d.Status
select js.DisplayOrder).FirstOrDefault(),//d.JobStatus.DisplayOrder,
StatusLastModifiedBy = (
from u in db.Users
where (u.UserId == d.StatusLastModifiedById)
select u.FirstName   " "   u.LastName
).FirstOrDefault(),
StatusLastModifiedById = d.StatusLastModifiedById,
StatusLastModified = d.StatusLastModified ?? DateTime.UtcNow,
CreatedByDisplay = (
from u in db.Users
where (u.UserId == d.CreatedById)
select u.FirstName   " "   u.LastName
).FirstOrDefault(),
CreatedById = d.CreatedById,
ModifiedByDisplay = (
from u in db.Users
where (u.UserId == d.LastModifiedById)
select u.FirstName   " "   u.LastName
).FirstOrDefault(),
LastModifiedById = d.LastModifiedById,
TimeCreated = d.TimeCreated ?? DateTime.UtcNow,
TimeDelivered = (d.Status == (int)JMS4.Utilities.JobStatus.Delivered) ? d.StatusLastModified : 
null,
TimeDue = d.TimeDue ?? DateTime.UtcNow,
TimeReady = d.TimeReady ?? DateTime.UtcNow,
TimeZoneId = timeZoneId.ToString(),
ExtClientId = d.ExtClientId,
Address = d.Address,
ReceivedBy = d.ReceivedBy,
ContactPhone = d.ContactPhone,
AfterHourContactNumber = d.AfterHoursContactNumber,
Email = d.Email,
CostEstimateNumber = d.CostEstimateNumber,
LastModifiedBy = d.LastModifiedBy,
MatterType = d.MatterType,
QaData = d.QaData,
InternalInstructions = d.InternalInstructions,
GlobalSearch = d.GlobalSearch
}
);

Then calling second function if search textbox have any keyword/keywords to search

jobs = jobs.Where(x => x.JMSNumber.ToLower().Contains(keyword.ToLower())
|| (x.ClientName != null && x.ClientName.ToLower().Contains(keyword.ToLower()))
|| (x.MatterReference != null && x.MatterReference.ToLower().Contains(keyword.ToLower()))
|| (x.ContactName != null && x.ContactName.ToLower().Contains(keyword.ToLower()))
|| (x.JobShortDescription != null && 
x.JobShortDescription.ToLower().Contains(keyword.ToLower()))
|| (x.StatusLastModifiedBy != null && 
x.StatusLastModifiedBy.ToLower().Contains(keyword.ToLower()))
|| (x.Address != null && x.Address.ToLower().Contains(keyword.ToLower()))
|| (x.Email != null && x.Email.ToLower().Contains(keyword.ToLower()))
|| (x.LastModifiedBy != null && 
x.LastModifiedBy.ToString().ToLower().Contains(keyword.ToLower()))
|| (x.CostEstimateNumber != null && 
x.CostEstimateNumber.ToLower().Contains(keyword.ToLower()))
|| (x.ClientId != null && x.ClientId.ToString().ToLower().Contains(keyword.ToLower()))
|| (x.JobDescription != null && !String.IsNullOrEmpty(x.JobDescription.ToString()) && 
x.JobDescription.ToString().ToLower().Contains(keyword.ToLower()))
|| (x.CreatedByDisplay != null && !String.IsNullOrEmpty(x.CreatedByDisplay.ToString()) && 
x.CreatedByDisplay.ToString().ToLower().Contains(keyword.ToLower()))
|| (x.ModifiedByDisplay != null && !String.IsNullOrEmpty(x.ModifiedByDisplay.ToString()) && 
x.ModifiedByDisplay.ToString().ToLower().Contains(keyword.ToLower()))
|| (x.InternalInstructions != null && 
x.InternalInstructions.ToString().ToLower().Contains(keyword.ToLower()))
);

AFter using these queries, it is taking more than 3 minutes to fetch the records.

Please suggest how can i improve the search performance and optimize the query.

CodePudding user response:

To optimise a query like this against a database there are a few rules to try and follow

  • Make sure the query is passed through to the database, do not operate in memory
  • Remove or reduce the use of functions
  • Don't bother comparing nulls
  • Split the query into multiple parallel queries
  • Improve the structure to optimise the query

The general idea is that you want to make your comparison directly in the index entries, function or conversions on records in the database will not use the indexes. Databases are specifically optimised to query against Indexes, so it will be important to also create the necessary indexes on your search columns.

You have tagged this as so we assume that your query is being passed through to the DB, it is important that you make sure it does. The following code and advice will only work if the LINQ expression is a genuine IQueryable<T> that will be resolved into SQL.

  1. If your database uses a CASE INSENSITIVE collation, then you can drop all the .ToLower() function calls, you want to avoid function calls so any indexes can be accessed directly.

    • Event though C# is sensitive to case by default, if the LINQ query is translated to SQL then it will obey to collation settings for standard LIKE '%' @param '%' comparison.
  2. Skip the null comparison, just like the .ToLower() it is not necessary in SQL to check the nullability of a field first before executing a comparison on that field.

This is already a far better filter:

jobs = jobs.Where(x => x.JMSNumber.Contains(keyword)
                    || x.ClientName.Contains(keyword)
                    || x.MatterReference.Contains(keyword)
                    || x.ContactName.Contains(keyword)
                    || x.JobShortDescription.Contains(keyword)
                    || x.StatusLastModifiedBy.Contains(keyword)
                    || x.Address.Contains(keyword)
                    || x.Email.Contains(keyword))
                    || x.LastModifiedBy.Contains(keyword))
                    || x.CostEstimateNumber.Contains(keyword))
                    || x.ClientId.ToString().Contains(keyword))
                    || x.JobDescription.Contains(keyword))
                    || x.CreatedByDisplay.Contains(keyword))
                    || x.ModifiedByDisplay.Contains(keyword))
                    || x.InternalInstructions.Contains(keyword))
                    );
  1. Store values in a searchable format.
    We can do better that the above if you really need to search on numeric fields, like in this case ClientId then it helps to store the numeric values in a string based column because our search argument is a string.
    • The easiest way to implement a variant of a field in the database is to use a computed column, however it needs to be a write computed or peristed column to realise the benefit for a search index. Make the computed column out of the expression:

      CAST(ClientId as char(10))
      

The same rule applies for any other column that might need a function applied to it, you will see greater performance if you move the function evaluation to the time that the record is INSERT or UPDATE, which happens with a much lower frequency to reads via SELECT.

  1. Normalize the structure, most of your comparisons are on a user - displayname if the query joins on to a user table, then you only have one seek for any user that matches instead of 1 separate seek for each user

There is a clear related table here for the User who is applying the data modifications. This can add a great deal of redundant information in the search query. Ideally we do not search across the user fields, as any match there would bring up all records that are associated with them, it is not usually a good search candidate, unless users do not edit many records. So if you can, exclude them from the general search, and allow the user to pick from a list of users to scope the results, or to search from the users in parallel with the main search

Now the search is much quicker: (this assumes a new column called ClientIdString)

jobs = jobs.Where(x => x.JMSNumber.Contains(keyword)
                    || x.ClientName.Contains(keyword)
                    || x.MatterReference.Contains(keyword)
                    || x.ContactName.Contains(keyword)
                    || x.JobShortDescription.Contains(keyword)
                    || x.Address.Contains(keyword)
                    || x.Email.Contains(keyword))
                    || x.CostEstimateNumber.Contains(keyword))
                    || x.ClientIdString.Contains(keyword))
                    || x.JobDescription.Contains(keyword))
                    || x.InternalInstructions.Contains(keyword))
                    );

Hypothetical User Search for searching and then filtering by the users:

var userIds = db.Users.Where(u => u.UserName.Contains(keyword))
                      .Select(u => u.Id)
                      .ToList();

//Filter to only rows that match the user lookup
if (jobs.Any())
{    
    jobs = jobs.Where(x => userIds.Contains(x.StatusLastModifiedByUserId)
                        || userIds.Contains(x.LastModifiedByUserId)
                        || userIds.Contains(x.CreatedByUserId)
                        || userIds.Contains(x.ModifiedByUserId)
                        );
}

If the schema is NOT already normalised, then I strongly suggest you do at least normalise out the users into their own table, indexing the possible users is much more efficient than searching across 200K records.

It is also possible that we can write the query directly in SQL. Sometimes we can write much more efficient SQL by hand than we might be able to achieve through LINQ don't feel bad about that, just recognise that it is one of many tools at your disposal.

  • Most Linq providers will give you an explicit mechanism for executing raw SQL that will return into a linq expression. The detail for this is out of scope, but searching is a specific scenario where this is accepted.

There are other external options too like SQL Server Full Text Search or MySQL FULLTEXT Indexes or even Microsoft Azure Search or Elastic Search. These external mechanisms can be used to return the search content directly or they might return references that you can use to access the records in your local DB.

  • Many NoSQL providers can be used to construct an efficient search index, the products I listed above as simply designed for searching and are likely to implement a lot of industry

Indexes

All of the above assumes that you have implemented adequate indexes on the underlying data store. Searching can have such a large impact on the user experience, it is worth putting in the effort to get it right.

  • Assessing and Implementing Indexes is out of scope for this question
  • Related