I have a map function that sends data in the form of (the value of they keys are not important)
key: "somevalue"
value: "value \t comma separated values"
for example
key:"0"
value:"5\t1,2,3,4"
If I use this code:
Text debug;
for (Text val : values) {
String[] segments = val.toString().split("\t");
debug = new Text();
debug.set(val.toString());
context.write(key, debug);
}
I get the right output, such as
key value
0 8 1,2,4,5
0 2 0,4,5
But if I try this code, the output gets weird:
Text debug;
for (Text val : values) {
String[] segments = val.toString().split("\t");
debug = new Text();
if(val.toString().split("\t").length > 1) {
try{
debug.set(val.toString().split("\t")[1]);
}catch(Exception e) {
debug.set("Exception")
}
}
context.write(key, debug);
}
The expected output would be:
key second part of value (after \t)
1 2,3,4,5,6
1 4,5,6,6,7
However the output I get is this:
key Tab (tab character after key)
1TAB
1TAB
...
2TAB
If I replace the try...catch
with if...else
:
Text debug;
for (Text val : values) {
String[] segments = val.toString().split("\t");
debug = new Text();
if(val.toString().split("\t").length > 1) {
debug.set(val.toString().split("\t")[1]);
} else {
debug.set("only one");
}
context.write(key, debug);
}
This gives the result
0 only one
...
100 only one
What's going on? I checked on Java and it seems that if I call "1\t2".split("\t")
it will give me ["1", "2"]
CodePudding user response:
I found the problem, I was using it as both a combiner and a reducer. Just needed to use it only as reducer.