Home > Software engineering >  Merging two files by making a HDFS application that merges them into one file located in HDFS
Merging two files by making a HDFS application that merges them into one file located in HDFS

Time:10-31

how the two different files may be indicated in the application, and then how to merge them

import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.io.IOUtils;

public class Task1
{
    public static void main(String[] args) throws Exception
    {
        String file_1 = args[0];
        Configuration config_1 = new Configuration();
        FileSystem into_fs = ???
        Path into_path = new Path(file_1)

Where is the file made, or how do I continue from here?

CodePudding user response:

Filesystem.get(config_1) will return a Filesystem object. From that, you need to specify String locations in the Path constructor, and create two of them. For example, you're already getting args[0], so get another one for args[1].

Your files need to exist on HDFS already, such as by hadoop fs -put commands.

Alternatively, just use hadoop fs -getmerge command, which is written in Java itself, and you could look at the source code, if you wanted

  • Related