If you have to write data into different files depending on the reducer key, you have to use org.apache.hadoop.mapreduce.lib.output.MultipleOutputs 1. Usage pattern for job submission:
Job job = new Job(); LazyOutputFormat.setOutputFormatClass( job, TextOutputFormat.class); // if your data have to be compressed TextOutputFormat.setCompressOutput(job, true); TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job,outDir); 2. In your reducer: public class YourReducer extends Reducer<Text, Text, NullWritable, Text> { private MultipleOutputs<NullWritable,Text> mos;
@Override protected void setup(Context context) throws IOException, InterruptedException { long start = System.currentTimeMillis(); super.setup(context); mos = new MultipleOutputs<NullWritable,Text> (context); }
@Override protected void cleanup(Context context) throws IOException, InterruptedException { super.cleanup(context); mos.close(); }
@Override protected void reduce(Text key,Iterable<Text> value,Context context) throws IOException, InterruptedException { String fileName = generateFileName(key); for (Text val : value) { mos.write(NullWritable.get(), val, fileName); }
// your method to generate file name based on the key private String generateFileName(Text key) { return key.toString()+Constants.FILE_NAME_PREFIX;
}