The input data is like:
1 5
2 8
1 3
2 7
4 9
The objective is to let the first number of each row be key,and the second number of each row be value.After shuffling I wanna output(key,value_list)
But I don't know how to output value list.
The output I expect is that:
1,[5,3]
2,[7,8]
4,[9]
After mapping I got:
1 5
1 3
2 7
2 8
4 9
public static class Map extends Mapper<LongWritable, Text, IntWritable, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
String token1 = tokenizer.nextToken();
String token2 = tokenizer.nextToken();
context.write(new IntWritable(Integer.parseInt(token1)), new Text(token2));
}
}
public static class Reduce extends Reducer<IntWritable, Text, IntWritable, Text> {
String iterableToString(Iterable<Text> values) {
StringBuilder sb = new StringBuilder("[");
for (Text val : values) {
sb.append(val.get()).append(",");
}
sb.setLength(sb.length() - 2);
sb.append("]");
return sb.toString();
}
public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// context.write(key, new Text(iterableToString(values)));
}
}
But there has a error message:
compile:
[javac] /home/zih-yan/hadoop_tutorial/build.xml:12: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to /home/zih-yan/hadoop_tutorial/bin
[javac] /home/zih-yan/hadoop_tutorial/src/f.java:32: error: cannot find symbol
[javac] sb.append(val.get()).append(",");
[javac] ^
[javac] symbol: method get()
[javac] location: variable val of type Text
[javac] Note: /home/zih-yan/hadoop_tutorial/src/f.java uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 error
Thanks!
CodePudding user response:
output value list.
Then your reducer must output Text type, not ArrayList
. You can only have Hadoop serializable types as your inputs and outputs.
That being said, problem can be extracted and individually unit tested.
String iterableToString(Iterable<Text> values) {
StringBuilder sb = new StringBuilder("[");
for (Text val : values) {
sb.append(val.get()).append(",");
}
sb.setLength(sb.length() - 2);
sb.append("]");
return sb.toString();
}
Then once tests pass, you can use it
public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
context.write(key, new Text(iterableToString(values)));
}
Regarding your compilation error, make sure you are using the correct import for the Text class