Home > other >  The spark WordCount problem, the output is always wrong, ask for help
The spark WordCount problem, the output is always wrong, ask for help

Time:09-27

enter the spark - shell, test: input,
then execute: , the output is always 1, speechless, for me for a long time, has been crazy, I hope the great god help me!!!!!! Here is my HDFS file:
, look from the final figure, word number so much, how also not a

CodePudding user response:

Hello, you such a statistic is not the number of words, but the number of RDD, you need to do this: val words=readmeFile. FlatMap (_. The split (" "))
Val wordCounts=words. The map (x=& gt; 1), (x) reduceByKey + _) (_
WordCounts. Print ()
This is the statistics the number of words,
Can join 366436387 spark technology exchange group, mutual exchange of learning,

CodePudding user response:

As shown in figure, calculate the number of rows, not the words

CodePudding user response:

You this statement is to read the file, the file is only one line
TextFile as the default is the default to enter a newline break up, so the output value of 1

Val words=readmeFile. FlatMap (_. The split (" "))
Val wordCounts=words. The map (x=& gt; (1), x). ReduceByKey (+ _ _)
  • Related