how the two different files may be indicated in the application, and then how to merge them
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.io.IOUtils;
public class Task1
{
public static void main(String[] args) throws Exception
{
String file_1 = args[0];
Configuration config_1 = new Configuration();
FileSystem into_fs = ???
Path into_path = new Path(file_1)
Where is the file made, or how do I continue from here?
CodePudding user response:
Filesystem.get(config_1)
will return a Filesystem object. From that, you need to specify String locations in the Path constructor, and create two of them. For example, you're already getting args[0]
, so get another one for args[1]
.
Your files need to exist on HDFS already, such as by hadoop fs -put
commands.
Alternatively, just use hadoop fs -getmerge
command, which is written in Java itself, and you could look at the source code, if you wanted