In the following code, 1Mil strings of equal length get created. Then they are looped through to find a matching string. The first run has strings that are 3 times longer than the second run.
The expected outcome was that the time it takes to do equality comparison of strings of different length would not vary due to 'string interning'. However, results show that a string with 3 times the length takes about 3 times to do an equality check. Why is that?
import { v4 as uuidv4 } from 'uuid';
export const uuid = () => {
return uuidv4();
};
function createSingleId(howManyUuidsInOneId1: number) {
let id = '';
for (let j = 0; j < howManyUuidsInOneId1; j ) {
id = uuid();
}
return id;
}
function generate(howManyIds: number, howManyUuidsInOneId: number) {
const ids = [];
for (let i = 0; i < howManyIds; i ) {
ids.push(createSingleId(howManyUuidsInOneId));
}
return ids;
}
const main = (howManyIds: number, howManyUuidsInOneId:number) => {
const ids = generate(howManyIds, howManyUuidsInOneId);
const toFind = createSingleId(howManyUuidsInOneId);
console.log(`Sample id being compared: '${toFind}'`);
const before = new Date().getTime();
ids.filter(id => id === toFind);
console.log(`Took '${new Date().getTime() - before}ms' to loop through and equal compare '${howManyIds}' when stacked '${howManyUuidsInOneId}' in single id`);
};
main(1000000, 3);
main(1000000, 1);
Output:
Sample id being compared: 'dc03bf00-6f2a-48d9-b3ca-b6ac45782c5cefaa92c0-9372-4f47-bcec-f9fbb41d4625e0c5c278-b574-4a9f-a77e-110cbc6bf601'
Took '64ms' to loop through and equal compare '1000000' when stacked '3' in single id
Sample id being compared: '07e693ce-49a1-4cc6-90e1-0bd99629123b'
Took '19ms' to loop through and equal compare '1000000' when stacked '1' in single id
> node --version
v15.14.0
CodePudding user response:
The expected outcome was that the time it takes to do equality comparison of strings of different length would not vary due to 'string interning'.
No, string interning only means that for some strings you know they're the same because they're stored in the same location, e.g. for string values created from the same string literals. But not all strings (especially not dynamically created ones) are getting interned, and having different memory addresses says nothing about the contents of the strings. If the memory location check fails, you still need to compare the string contents as usual.
Some example to demonstrate this:
function generateString(len) {
let x = "";
for (let i=0; i<len; i ) x = String.fromCharCode(64 i%64);
return x;
}
function time(callback, desc) {
const before = performance.now();
const res = callback();
console.log(`Took ${performance.now()-before}ms to ${desc}`);
return res;
}
const strLen = 5000000;
const a = generateString(strLen);
const b = generateString(strLen);
console.assert(a === b);
const str = a;
time(() => str === a, 'compare a with itself');
time(() => str === b, 'compare a with b');
a
and b
have the same contents, but are different string objects (in memory) because they were accumulated in different generateString
calls. str
references the same value that a
does.