For example there are three files need to be as a mapper input processing: file01, file02 and file03,
There are three files as configuration parameters stored in local: r1, r2 and r3,
I need mapper program for r1, r2 and r3 treatment and generate three different instances, processing algorithm is same, just different parameters, so the generated different classes such as p1, p2 and p3, then I want only handle file01 p1, p2 deals only with file02, p3 processing file03 only, and on three slave node separate processing, mutual influence, and the calculated results by the reduce to carry on the summary, is that ok?
My purpose is to performance overhead, because r1, r2 and r3 instance is very big, if too slow, merged into a practical calculation to my knowledge, the demand on the Hadoop cannot come true, because Hadoop map input file appears to be indistinguishable, nasty, Hadoop is a great god help make a idea, thank you very much,
CodePudding user response:
No one know? Anxious for me