Home > Blockchain >  Does 'hdfs dfs -cp' use /tmp as part of its implementation
Does 'hdfs dfs -cp' use /tmp as part of its implementation

Time:04-27

Trying to investigate an issue where /tmp is filling up and we don't know what's causing it. We do have a recent change that's using the HDFS command to perform a copy to another host (hdfs dfs -cp /source/file hdfs://other.host:port/target/file, and while the copy operation doesn't directly touch or reference /tmp it could potentially be using it as part of its implementation.

But I can't find anything in the documentation to confirm or refute that theory - does anyone else know the answer?

CodePudding user response:

You could look at the code:

Here's the code for copying using HDFS. It uses it's own internal CommandWithDestination class. And writes everything using another internal class which is really just java.io. classes. (To complete the actual write.) So it's buffering byte's in memory and sending the bytes around. Likely not the issue. You could check this by altering the tmp directory used by java. (java.io.tmpdir)

export _JAVA_OPTIONS=-Djava.io.tmpdir=/new/tmp/dir

According to the java.io.File Java Docs

The default temporary-file directory is specified by the system property java.io.tmpdir. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "c:\temp". A different value may be given to this system property when the Java virtual machine is invoked, but programmatic changes to this property are not guaranteed to have any effect upon the the temporary directory used by this method.

Metheod used to by HDFS copy:

protected void copyStreamToTarget(InputStream in, PathData target)
  throws IOException {
    if (target.exists && (target.stat.isDirectory() || !overwrite)) {
      throw new PathExistsException(target.toString());
    }
    TargetFileSystem targetFs = new TargetFileSystem(target.fs);
    try {
        System.out.flush();
        System.out.println("Hello Copy Stream");
      PathData tempTarget = direct ? target : target.suffix("._COPYING_");
      targetFs.setWriteChecksum(writeChecksum);
      targetFs.writeStreamToFile(in, tempTarget, lazyPersist, direct); //here's where it uses Java.io to write the file to hdfs.
      if (!direct) {
        targetFs.rename(tempTarget, target);
      }
    } finally {
      targetFs.close(); // last ditch effort to ensure temp file is removed
    }
  }
  • Related