Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop How to overwrite an existing output file/dir during execution of MapReduce jobs?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #4614


    While submitting MapReduce job we need to supply a new output directory where MapReduce Job will write the output. But if the output directory already exists it throws an exception saying OutputDirectoryAlreadyExist. I don’t want to supply new directory each time while submitting MapReduce Job. How to configure MapReduce to overwrite existing output directory?



    Below two steps to delete the output directory(not recommended) in MapReduce:
    1) using shell:
    bin/hadoop dfs -rmr /path/to/your/output/
    2) JAVA API:

    // configuration should contain reference to your namenode
    FileSystem fs = FileSystem.get(new Configuration());
    // true stands for recursively deleting the folder you gave
    fs.delete(new Path(”/path/to/your/output”), true);

    If you want to override the existing:
    Need to overwrite the Hadoop OutputFormat class:

    public class OverwriteOutputDirOutputFile extends TextOutputFormat{
    public void checkOutputSpecs(FileSystem ignored, JobConf job)
    throws FileAlreadyExistsException,
    InvalidJobConfException, IOException {
    // Ensure that the output directory is set and not already there
    Path outDir = getOutputPath(job);
    if (outDir == null && job.getNumReduceTasks() != 0) {
    throw new InvalidJobConfException(”Output directory not set in JobConf.”);
    if (outDir != null) {
    FileSystem fs = outDir.getFileSystem(job);
    // normalize the output directory
    outDir = fs.makeQualified(outDir);
    setOutputPath(job, outDir);
    // get delegation token for the outDir’s file system
    new Path[] {outDir}, job);
    // check its existence
    /* if (fs.exists(outDir)) {
    throw new FileAlreadyExistsException(”Output directory ” + outDir +
    ” already exists”);

    and need to set this as part of job configuration.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.