Home > Software design >  MapReduce - reduce running while map is not finished
MapReduce - reduce running while map is not finished

Time:11-04

I've implemented a simple WordCount-application in hadoop. On my cluster, I have one namenode and 4 datanodes. Replication-rate is set to 4. In the filesystem I have put many lorem-impsum-files. While running the wordcount application I see the reducer working even though the mappers aren't finished yet.

2021-10-29 14:53:31,044 INFO mapreduce.Job:  map 70% reduce 23%

How does this work? On many tutorial pages is written (one page for example): "A reducer cannot start while a mapper is still in progress" https://www.talend.com/resources/what-is-mapreduce/

How can the reducers work if the result set of mapping isn't completed?

CodePudding user response:

Once data is emitted by a mapper, it undergoes two steps:

  1. It is shuffled - this is the process of sending data to the correct reducer depending on its key and the partitioner logic.
  2. It is sorted - this happens on the reducer itself.

So even though data is still being emitted by the mapper, reducer tasks are being created and are sorting data as it arrives. You're correct in that they won't actually start processing values until all mapping has finished.

  • Related