Dealt with 3 TXT file, the total size is about 1.3 G, what do you want to be a statistical key words, the number of occurrences of the trigger shuffle operation of memory
Under the spark - shell command execution, there are six worker each allocation 2 g memory
Below is the execution, I would like to ask is the memory leak? Don't 1.3 G data processing? By the way how much how much memory can spark processing data?
Scala> Val source=sc. TextFile (" HDFS://node1:9100/user/wzy/sogoudata ")
Source: org. Apache. Spark. RDD. RDD [String]=HDFS://node1:9100/user/wzy/sogoudata MapPartitionsRDD [10] at textFile ats & lt; Console> : 24
Scala> Val key_1=source. The map (x=& gt; (x.s plit (" \ t "), (2), 1))
Key_1: org. Apache. Spark. RDD. RDD [(String, Int)]=MapPartitionsRDD [13] at the map at & lt; Console> 26:
Scala> Key_1. Take (3)
Res8: Array [(String, Int)]=Array ((" ecomax "qing, 1), (mortals. Cultivate immortality, 1), (laptop alliance, 1))
Scala> Val key_count=key_1. ReduceByKey (+ _ _)
Key_count: org. Apache. Spark. RDD. RDD [(String, Int)]=ShuffledRDD [14] at reduceByKey ats & lt; Console> 28:
Scala> Key_count. Take (3)
[Stage 11:======================================& gt; (8 + 4)/12] 16/08/05 14:59:51 WARN TaskSetManager: Lost task in Stage 3.0 11.0 (48, dar 10.130.152.17) : Java. Lang. OutOfMemoryError: Java heap space
The at Java. Util. Arrays. CopyOf (Arrays. Java: 3236)
At org, apache hadoop. IO. Text. SetCapacity (266) Text. Java:
At org, apache hadoop. IO. Text. Append (236) Text. Java:
At org, apache hadoop. Util. LineReader. ReadDefaultLine (LineReader. Java: 243)
At org, apache hadoop. Util. LineReader. ReadLine (LineReader. Java: 174)
.
CodePudding user response:
2 g is too little, if you want to see whether because of memory, open two ports, a spark - running shell, another free -m see informs by how much