I have an array of strings with approximately 50,000 elements.
export const companies = [
"000014",
"000016",
"000017",
"000019",
"000020",
"000021",
"000023",
"000025",
...
]
These are the names of companies for which I display certain pages. I made a middleware in which I run a loop and walk through this large array.
import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";
export async function middleware(req: NextRequest) {
const { pathname } = req.nextUrl;
// cycle for comparing current URL with companies
await for (let i = 0; i < companies.length; i ) {
if (pathname.startsWith(`/${companies[i]}`))
return NextResponse.redirect(new URL("/", req.url)); // redirect to main page if companies compare with current pathname
}
}
It takes some time, how can it be optimized? There was an idea to divide the array into chunks, but this is also not a very good option.
CodePudding user response:
Assuming pathname looks something like /000014/abc/xyz
then you can get rid of the array iteration entirely. Something like this:
import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";
const companiesSet = new Set(companies);
export async function middleware(req: NextRequest) {
const { pathname } = req.nextUrl;
// Ideally you could get this from req.params.companyId instead, but whether that exists or not would depend on the routing code, which you haven't shown.
const companyId = req.pathname.match(/\/([0-9] )\//)?.[1];
if (companiesSet.has(companyId)) {
return NextResponse.redirect(new URL("/", req.url)); // redirect to main page if companies compare with current pathname
}
}
That being said, 50,000 elements isn't really that large, all things considered, and that code shouldn't have been particularly slow. The unnecessary await
and the string building inside of the loop may have been an issue.
CodePudding user response:
If for some reason you can't just do map / object lookups, like @Aurast suggested (for example, you need to do free text searching to find closest matching name), then one very simple approach is to group companies so you can divide the list into smaller chunks, and search only one chunk. For example, divide by initial letter:
export const companies = {
"a": ["aardvarks, inc", "AppendeciteCorp", ],
"b": ["boulevard bricklayers", "bandits and bakers", "brompton fan club"],
}
Then you can look up by just first letter, in order to get a much smaller array. This would be very simple to implement.
The other approach is to take this to its logical conclusion, and recursively group by first letter in a data structure called a trie
or prefix tree
. This allows you to very quickly search for stuff even if it's not an exact match.
CodePudding user response:
There are a few ways to improve performance
- Algorithm: you can use a faster search algorithm such as binary search or hash table lookup. These algorithms have a lower time complexity and can search through the array more efficiently.
here is an example using Binary Search
Binary Search Algorithm
The array should be sorted in order to use
Binary Search
import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";
function binarySearch(array, target) {
let left = 0;
let right = array.length - 1;
while (left <= right) {
const middle = Math.floor((left right) / 2);
const value = array[middle];
if (value === target) {
return middle;
} else if (value < target) {
left = middle 1;
} else {
right = middle - 1;
}
}
return -1;
}
export async function middleware(req: NextRequest) {
const { pathname } = req.nextUrl;
const target = pathname.substring(1);
// Use binary search to find company name in array
const index = binarySearch(companies, target);
if (index !== -1) {
return NextResponse.redirect(new URL("/", req.url));
}
}
Hash table
import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";
const hashTable = {};
for (const company of companies) {
hashTable[company] = true;
}
export async function middleware(req: NextRequest) {
const { pathname } = req.nextUrl;
const target = pathname.substring(1);
// Use hash table to find company name in array
if (hashTable[target]) {
return NextResponse.redirect(new URL("/", req.url));
}
}
- cache: If the search results are not changing frequently, you can use a cache to store the search results and avoid the need to search through the entire array every time. This can greatly reduce the time it takes to search through the array, especially if the same searches are being performed repeatedly.
Caching
import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";
const cache = new Map(); // Create a new cache
function linearSearch(array, target) {
// Check if search result is in cache
if (cache.has(target)) {
return cache.get(target);
}
// Search through the array
for (let i = 0; i < array.length; i ) {
if (array[i] === target) {
cache.set(target, i); // Add search result to cache
return i;
}
}
return -1;
}
export async function middleware(req: NextRequest) {
const { pathname } = req.nextUrl;
const target = pathname.substring(1);
// Use linear search to find company name in array
const index = linearSearch(companies, target);
if (index !== -1) {
return NextResponse.redirect(new URL("/", req.url));
}
}
Alternatively, You can use the
Redis
database for caching.
Time Complexity
Linear search: O(n),
where n is the size of the array. This means that the search time increases linearly with the size of the array. Linear search is the most basic search algorithm and is relatively easy to implement, but it can be slow for large arrays.Binary search: O(log n),
where n is the size of the array. This means that the search time increases logarithmically with the size of the array. Binary search is more efficient than linear search, but it requires that the array be sorted.Hash table: O(1),
where n is the size of the array. This means that the search time is constant and does not depend on the size of the array. Hash tables are very efficient for search and insertion operations, but they can take up a lot of memory.Cache: O(1),
where n is the size of the cache. This means that the search time is constant and does not depend on the size of the array. Caches can be used to store the results of expensive searches and avoid the need to search through the entire array every time. The size of the cache is usually much smaller than the size of the array, so the search time is generally very fast. However, the cache can fill up over time and may need to be periodically cleared to free up memory.