I have huge JSON file to process which holds around 15,00,000 JSON objects.I am performing some searching operation where I am using two for loops under which I am comapring object values
Below is an example:
const data = [
{
"slug": "vertical-lift-module-market",
"id": 68055,
"related_reports_updated": {
"sub_categories": [
{
"slug": "audience-analytics-market",
"id": 66684,
"short_title": "Audience Analytics Market"
},
{
"slug": "mobile-wallet-market",
"id": 68830,
"short_title": "Mobile Wallet Market"
}
}
},
{
"slug": "united-states-real-estate-services---growth-trends-and-forecast-2022-- -2027",
"id": 68056,
"related_reports_updated": {
"sub_categories": [
{
"slug": "canada-real-estate-services-market---growth-trends-and-forecast-2020---2025",
"id": 68051,
"short_title": "Canada Real Estate Services Market"
},
{
"slug": "germany-real-estate-services-market--growth-trends-and-forecast-2020---2025",
"id": 68054,
"short_title": "Germany Real Estate Services Market"
},
}
},
{
...
}
]
//This data holds 15,00,000 JSON objects
What I am trying to do is comparing slug
of one object with slug
available in sub_categories
array of other objects.If its present then create one object and push it into the result
array and send that result
array.
const result = [];
for(var i=0;i<data.length;i ) {
for(var j=0;j<data.length;j ) {
//Comparing operation
}
}
console.log(result);
But after running sometime its giving me this error:
[41955:0x523ce90] 162238 ms: Mark-sweep (reduce) 4096.9 (4102.7) -> 4096.9 (4104.7)
MB, 3481.7 / 0.4 ms (average mu = 0.092, current mu = 0.000) allocation failure scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xa3ac10 node::Abort() [node]
2: 0x970199 node::FatalError(char const*, char const*) [node]
3: 0xbba58e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool)
[node]
4: 0xbba907 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char
const*, bool) [node]
5: 0xd76b25 [node]
6: 0xd776af [node]
7: 0xd854eb v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace,
v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0xd890ac v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int,
v8::internal::AllocationType, v8::internal::AllocationOrigin,
v8::internal::AllocationAlignment) [node]
9: 0xd5778b v8::internal::Factory::NewFillerObject(int, bool,
v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x109fd4f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*,
v8::internal::Isolate*) [node]
11: 0x1448f59 [node]
Aborted (core dumped)
To get rid of this error I even tried node --max-old-space-size=4096 index.js
for maximizing memory for node processes.
But still getting the same issue is there any other way round to resolve this issue and get the desired result.Anyone help me out.
CodePudding user response:
Let's assume that any element have word and number.
- double precision 64-bit number in js takes 8 bytes*
- word with 5 letters is around 10 bytes**
If single element have 18 bytes then
1.5e6 elements is 2.7e7 bytes = 27 MB
Independently from code:
const obj = {};
if(data[i].name == data[j].name) {
obj.name = data[i].name;
}
you pushing to result
array in double loop. Any of these loops goes through 1.5e6
elements so summary you are going to add to result
a lot elements, in this case: 1.5e6
* 1.5e6
= 2.25e12
.
Now let's compute size of result
2.25e12
* (8 bytes
10 bytes
) = 4.05e13 bytes
We know that 1 gigabyte = 1 000 000 000 bytes = 1e9 bytes 1 terabyte = 1e12 bytes
So in your case you need 40 terabytes to save your result
.
Definitely you have to rethink algorithm that you want to implement.
*each number actually takes up an average of 9.7 bytes. source
**we can assume 2 bytes per character for but it can be 5 or more bytes source
CodePudding user response:
If you use a less complex algorithm that does not add duplicate values, it will push much lower number of elements into result array.
let dictionary = {};
// O(n) to build dictionary of all parent slugs
for(let i=0;i<n;i )
{
dictionary[data[i].slug]=1; // hashing
}
// O(n*k) to check dictionary instead of doing O(n*n*k) for brute-force
for(let i=0;i<n;i )
for(let j=0;j<data[i].sub_cat.length;j )
{
// O(1)
if(data[i].sub_cat[j].slug in dictionary)
dictionary[data[i].sub_cat[j].slug] ; // 2 => will go to result
}
// add to result O(n)
for (const [key, value] of Object.entries(dictionary)) {
if(value==2)
result.push(key);
}
Since it also compares to itself, like element 5 vs element 5, it may still duplicate as in comparison but not as result item. So you may need to add an extra comparison for that depending on required algorithm.