Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop How to overwrite an existing output file/dir during execution of MapReduce jobs?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #4614

    dfbdteam3
    Moderator

    While submitting MapReduce job we need to supply a new output directory where MapReduce Job will write the output. But if the output directory already exists it throws an exception saying OutputDirectoryAlreadyExist. I don’t want to supply new directory each time while submitting MapReduce Job. How to configure MapReduce to overwrite existing output directory?

    #4615

    dfbdteam3
    Moderator

    Below two steps to delete the output directory(not recommended) in MapReduce:
    1) using shell:
    bin/hadoop dfs -rmr /path/to/your/output/
    2) JAVA API:

    // configuration should contain reference to your namenode
    FileSystem fs = FileSystem.get(new Configuration());
    // true stands for recursively deleting the folder you gave
    fs.delete(new Path(”/path/to/your/output”), true);

    If you want to override the existing:
    Need to overwrite the Hadoop OutputFormat class:

    public class OverwriteOutputDirOutputFile extends TextOutputFormat{
    
    @Override
    public void checkOutputSpecs(FileSystem ignored, JobConf job)
    throws FileAlreadyExistsException,
    InvalidJobConfException, IOException {
    // Ensure that the output directory is set and not already there
    Path outDir = getOutputPath(job);
    if (outDir == null && job.getNumReduceTasks() != 0) {
    throw new InvalidJobConfException(”Output directory not set in JobConf.”);
    }
    if (outDir != null) {
    FileSystem fs = outDir.getFileSystem(job);
    // normalize the output directory
    outDir = fs.makeQualified(outDir);
    setOutputPath(job, outDir);
    
    // get delegation token for the outDir’s file system
    TokenCache.obtainTokensForNamenodes(job.getCredentials(),
    new Path[] {outDir}, job);
    
    // check its existence
    /* if (fs.exists(outDir)) {
    throw new FileAlreadyExistsException(”Output directory ” + outDir +
    ” already exists”);
    }*/
    }
    }
    }

    and need to set this as part of job configuration.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.