Home > Back-end >  Spark application in incomplete section of spark-history even when complited
Spark application in incomplete section of spark-history even when complited

Time:11-02

In my Spark-history some applications are "incomplete" for a week now. I've tried to kill them, close sparkContext(), kill main .py process, but nothing helped.

For example,

yarn application -status <id>

shows:

...
State: FINISHED
Final-State: SUCCEDED
...
Log Aggregation Status: TIME_OUT
...

But in Spark-History I still see it in incomplete section of my applications. If I open this application there, I can see 1 Active job with 1 Alive executor, but they are doing nothing for all week. This seems like a logging bug, but as I know this problem is only with me, other coworkers don't have this problem.

This thread doesn't helped me, because I dont have access to start-history-server.sh.

I suppose this problem because of

Log Aggregation Status: TIME_OUT

because my "completed" applications have

Log Aggregation Status: SUCCEDED

What can I do to fix this? Right now I have 90 incomplete applications.

I've found clear description of my problem with same situation (yarn, spark, etc.), but there is no solution: What is 'Active Jobs' in Spark History Server Spark UI Jobs section

CodePudding user response:

From Spark Monitoring and Instrumentation:

...
3. Applications which exited without registering themselves as completed will be listed as incomplete --even though they are no longer running. This can happen if an application crashes.
...

Meaning:
History Server's UI shows only those Spark applications whose event logs it can find in its spark.eventLog.dir directory (a config typically set to /user/spark/applicationHistory in Hadoop). If a log doesn't end with the special ApplicationEnd event

:
{"Event":"SparkListenerApplicationEnd","Timestamp":1667223930402}

...the application is considered incomplete (even if it is no longer running) and will be displayed on the Incomplete Applications page.

To your question it means that "moving" application to the Completed Apps page won't be trivial, and will require manually editing eventlog and re-uploading it to SHS directory in Hadoop. Moreover, it won't solve anything, since most likely, your application keeps crashing before it can write that final message, and its next run will end up on the same Incomplete page again.

To diagnose the reason why it fails, perhaps you can look at the application driver logs for any clues -- errors or exception messages. Graceful shutdown looks different depending on what kind of resource manager and what deploy mode your app is using. For deploy-mode=cluster and YARN RM, it would look something like this:

:
22/10/31 11:11:11 INFO spark.SparkContext: Successfully stopped SparkContext
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
22/10/31 11:11:11 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://.../.../.sparkStaging/application_<appId>
22/10/31 11:11:11 INFO util.ShutdownHookManager: Shutdown hook called
22/10/31 11:11:11 INFO util.ShutdownHookManager: Deleting directory /.../.../appcache/application_<appId>/spark-<guid>

 
  • Related