I have a working Java Map Reduce Program with 2 jobs. The output of the first reduce is written on a file and read by the second mapper.
I would like to change the first reducer output to be a SequenceFile.
How can i do this?
This is the main of my program
public static void main(String[] args) throws Exception {
//setup first job
Configuration conf = new Configuration();
conf.set("mapred.textoutputformat.separator", "&");
Job job = Job.getInstance(conf, "First Job");
job.setJarByClass(Prova.class);
job.setMapperClass(FirstMapper.class);
job.setReducerClass(FirstReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
Path tempOutput=new Path("FirstMapper");
FileOutputFormat.setOutputPath(job, tempOutput);
job.waitForCompletion(true);
//setup second job
Configuration conf2 = new Configuration();
conf2.set("mapred.textoutputformat.separator", " ");
conf2.set("numberOfELements", args[2]);
Job job2 = Job.getInstance(conf2, "Second Job");
job2.setJarByClass(Prova.class);
job2.setMapperClass(SecondMapper.class);
job2.setReducerClass(SecondReducer.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job2, tempOutput);
FileOutputFormat.setOutputPath(job2, new Path(args[1]));
System.exit(job2.waitForCompletion(true) ? 0 : 1);
}
I already tried by adding the following lines:
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job2.setInputFormatClass(SequenceFileInputFormat.class);
but i get the following error: wrong value class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.IntWritable
. The error happens when i make contect.write(Text,Text) in the first reducer.
CodePudding user response:
context.write(Text, Text)
and job.setOutputValueClass(IntWritable.class);
disagree with one another. Make them consistent and it should work.