Home > other > How to through the API in your code to monitor the Hadoop, Spark task progress and result?
How to through the API in your code to monitor the Hadoop, Spark task progress and result?
Time:09-20
Company recently began to make big data project, so we have to 0 based learning, start now has on the basic theory of entry, can write simple calculation program run separately, but the project requirements is more complex, involving technology has several, mainly for Hadoop2 (HDFS, YARN, graphs), Sqoop, Flume, Spark1.6, overall process can be summarized as: 1. The inside of the Oracle data via Sqoop imported into the Hive/HDFS, through the Flume file data imported into the Hive/HDFS 2. Using the Spark program of the imported data calculation, calculation results is saved as a Hive table 3. The calculation results via Sqoop export to Oracle Going to take a step to complete the whole process is serial, and can successfully into the next phase of operation, so the company to develop a Java version of the "total control" program, realize the control of each procedure, I'd like to, feel a little bit mission zapper, can through the Java Runtime to exec commands such as sqoop import/export, spark - submit, etc., but how Java program can monitor the hadoop graphs (because sqoop is the underlying hadoop graphs task), the task of the spark? It doesn't matter even if there is no schedule information, the key is to be able to get the task status and task is success or failure of information, in order to begin the next step of work, Pray god taught!!!!!