Home > Back-end >  Javascript heap out of memory issue on comparing elements in large for loops
Javascript heap out of memory issue on comparing elements in large for loops

Time:01-08

I have huge JSON file to process which holds around 15,00,000 JSON objects.I am performing some searching operation where I am using two for loops under which I am comapring object values

Below is an example:

const data = [
 {
  "slug": "vertical-lift-module-market",
  "id": 68055,
  "related_reports_updated": {
  "sub_categories": [
    {
      "slug": "audience-analytics-market",
      "id": 66684,
      "short_title": "Audience Analytics Market"
    },
    {
      "slug": "mobile-wallet-market",
      "id": 68830,
      "short_title": "Mobile Wallet Market"
    }
  }
},
{
"slug": "united-states-real-estate-services---growth-trends-and-forecast-2022-- -2027",
"id": 68056,
"related_reports_updated": {
  "sub_categories": [
    {
      "slug": "canada-real-estate-services-market---growth-trends-and-forecast-2020---2025",
      "id": 68051,
      "short_title": "Canada Real Estate Services Market"
    },
    {
      "slug": "germany-real-estate-services-market--growth-trends-and-forecast-2020---2025",
      "id": 68054,
      "short_title": "Germany Real Estate Services Market"
    },
  }
 },
 {
  ...
 }  
]
//This data holds 15,00,000 JSON objects 

What I am trying to do is comparing slug of one object with slug available in sub_categories array of other objects.If its present then create one object and push it into the result array and send that result array.

const result = [];

for(var i=0;i<data.length;i  ) {
  
   for(var j=0;j<data.length;j  ) {

        //Comparing operation
  }

} 

console.log(result);

But after running sometime its giving me this error:

[41955:0x523ce90]   162238 ms: Mark-sweep (reduce) 4096.9 (4102.7) -> 4096.9 (4104.7) 
MB, 3481.7 / 0.4 ms  (average mu = 0.092, current mu = 0.000) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xa3ac10 node::Abort() [node]
2: 0x970199 node::FatalError(char const*, char const*) [node]
3: 0xbba58e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) 
[node]
4: 0xbba907 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char 
const*, bool) [node]
5: 0xd76b25  [node]
6: 0xd776af  [node]
7: 0xd854eb v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, 
v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0xd890ac v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, 
v8::internal::AllocationType, v8::internal::AllocationOrigin, 
v8::internal::AllocationAlignment) [node]
9: 0xd5778b v8::internal::Factory::NewFillerObject(int, bool, 
v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x109fd4f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, 
v8::internal::Isolate*) [node]
11: 0x1448f59  [node]

Aborted (core dumped)

To get rid of this error I even tried node --max-old-space-size=4096 index.js for maximizing memory for node processes.

But still getting the same issue is there any other way round to resolve this issue and get the desired result.Anyone help me out.

CodePudding user response:

Let's assume that any element have word and number.

  • double precision 64-bit number in js takes 8 bytes*
  • word with 5 letters is around 10 bytes**

If single element have 18 bytes then

1.5e6 elements is 2.7e7 bytes = 27 MB

Independently from code:

         const obj = {};
             
         if(data[i].name == data[j].name) {
              obj.name = data[i].name;
         }

you pushing to result array in double loop. Any of these loops goes through 1.5e6 elements so summary you are going to add to result a lot elements, in this case: 1.5e6 * 1.5e6 = 2.25e12.

Now let's compute size of result

2.25e12 * (8 bytes 10 bytes) = 4.05e13 bytes

We know that 1 gigabyte = 1 000 000 000 bytes = 1e9 bytes 1 terabyte = 1e12 bytes

So in your case you need 40 terabytes to save your result.

Definitely you have to rethink algorithm that you want to implement.


*each number actually takes up an average of 9.7 bytes. source

**we can assume 2 bytes per character for but it can be 5 or more bytes source

CodePudding user response:

If you use a less complex algorithm that does not add duplicate values, it will push much lower number of elements into result array.

let dictionary = {};

// O(n) to build dictionary of all parent slugs
for(let i=0;i<n;i  )
{
   dictionary[data[i].slug]=1; // hashing
}

// O(n*k) to check dictionary instead of doing O(n*n*k) for brute-force
for(let i=0;i<n;i  )
for(let j=0;j<data[i].sub_cat.length;j  )
{
    // O(1)
    if(data[i].sub_cat[j].slug in dictionary)
        dictionary[data[i].sub_cat[j].slug]  ; // 2 => will go to result
}


// add to result O(n)
for (const [key, value] of Object.entries(dictionary)) {
  if(value==2)
      result.push(key);
}  

Since it also compares to itself, like element 5 vs element 5, it may still duplicate as in comparison but not as result item. So you may need to add an extra comparison for that depending on required algorithm.

  • Related