How to solve the spark memory leak problem-CodePudding

Cluster information is as follows:

Dealt with 3 TXT file, the total size is about 1.3 G, what do you want to be a statistical key words, the number of occurrences of the trigger shuffle operation of memory
Under the spark - shell command execution, there are six worker each allocation 2 g memory

Below is the execution, I would like to ask is the memory leak? Don't 1.3 G data processing? By the way how much how much memory can spark processing data?

 

Scala> Val source=sc. TextFile (" HDFS://node1:9100/user/wzy/sogoudata ") 
Source: org. Apache. Spark. RDD. RDD [String]=HDFS://node1:9100/user/wzy/sogoudata MapPartitionsRDD [10] at textFile ats & lt; Console> : 24 

Scala> Val key_1=source. The map (x=& gt; (x.s plit (" \ t "), (2), 1)) 
Key_1: org. Apache. Spark. RDD. RDD [(String, Int)]=MapPartitionsRDD [13] at the map at & lt; Console> 26: 

Scala> Key_1. Take (3) 
Res8: Array [(String, Int)]=Array ((" ecomax "qing, 1), (mortals. Cultivate immortality, 1), (laptop alliance, 1)) 

Scala> Val key_count=key_1. ReduceByKey (+ _ _) 
Key_count: org. Apache. Spark. RDD. RDD [(String, Int)]=ShuffledRDD [14] at reduceByKey ats & lt; Console> 28: 

Scala> Key_count. Take (3) 
[Stage 11:======================================& gt; (8 + 4)/12] 16/08/05 14:59:51 WARN TaskSetManager: Lost task in Stage 3.0 11.0 (48, dar 10.130.152.17) : Java. Lang. OutOfMemoryError: Java heap space 
The at Java. Util. Arrays. CopyOf (Arrays. Java: 3236) 
At org, apache hadoop. IO. Text. SetCapacity (266) Text. Java: 
At org, apache hadoop. IO. Text. Append (236) Text. Java: 
At org, apache hadoop. Util. LineReader. ReadDefaultLine (LineReader. Java: 243) 
At org, apache hadoop. Util. LineReader. ReadLine (LineReader. Java: 174) 
.

CodePudding user response:

2 g is too little, if you want to see whether because of memory, open two ports, a spark - running shell, another free -m see informs by how much