Home > Back-end >  Hadoop reduce function to output (key,valuelist)
Hadoop reduce function to output (key,valuelist)

Time:11-02

The input data is like:

1 5 
2 8
1 3
2 7
4 9

The objective is to let the first number of each row be key,and the second number of each row be value.After shuffling I wanna output(key,value_list)

But I don't know how to output value list.

The output I expect is that:

1,[5,3] 
2,[7,8]
4,[9]

After mapping I got:

1 5
1 3
2 7
2 8
4 9
 public static class Map extends Mapper<LongWritable, Text, IntWritable, Text> {
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        String token1  = tokenizer.nextToken();
        String token2  = tokenizer.nextToken();
        context.write(new IntWritable(Integer.parseInt(token1)), new Text(token2));

    }
 }



public static class Reduce extends Reducer<IntWritable, Text, IntWritable, Text> {
    String iterableToString(Iterable<Text> values) {
    StringBuilder sb = new StringBuilder("[");

    for (Text val : values) {
        sb.append(val.get()).append(",");
    }
    sb.setLength(sb.length() - 2);
    sb.append("]");
    return sb.toString();
}
    public void reduce(IntWritable key, Iterable<Text> values, Context context)
      throws IOException, InterruptedException {

     //  context.write(key, new Text(iterableToString(values)));

 }

}

But there has a error message:

compile:
    [javac] /home/zih-yan/hadoop_tutorial/build.xml:12: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
    [javac] Compiling 1 source file to /home/zih-yan/hadoop_tutorial/bin
    [javac] /home/zih-yan/hadoop_tutorial/src/f.java:32: error: cannot find symbol
    [javac]         sb.append(val.get()).append(",");
    [javac]                      ^
    [javac]   symbol:   method get()
    [javac]   location: variable val of type Text
    [javac] Note: /home/zih-yan/hadoop_tutorial/src/f.java uses or overrides a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] 1 error

Thanks!

CodePudding user response:

output value list.

Then your reducer must output Text type, not ArrayList. You can only have Hadoop serializable types as your inputs and outputs.

That being said, problem can be extracted and individually unit tested.

String iterableToString(Iterable<Text> values) {
    StringBuilder sb = new StringBuilder("[");
    
    for (Text val : values) {
        sb.append(val.get()).append(",");
    }
    sb.setLength(sb.length() - 2);
    sb.append("]");
    return sb.toString();
} 

Then once tests pass, you can use it

public void reduce(IntWritable key, Iterable<Text> values, Context context)
  throws IOException, InterruptedException {
    context.write(key, new Text(iterableToString(values)));
}

Regarding your compilation error, make sure you are using the correct import for the Text class

  • Related