In Hadoop YARN, the YARN containers exit when a SIGTERM signal is caught. So, how to detect when the YARN container is about to end and run some custom code. How do I inject it into the YARN framework?
I am looking for a solution especially for Spark on YARN but also a common solution applicable for other services that use YARN (Hive on Tez,MR)
CodePudding user response:
If we are talking about cleaning up the node think about using:
yarn.nodemanager.localizer.cache.target-size-mb
yarn.nodemanager.localizer.cache.cleanup.interval-ms
Good explanation of those properties here.
CodePudding user response:
For True freedom of SIGTERM you may want to dig into the code of yarn itself to find how you could hijack or extend the yarn container executor itself to bend it to your will. This would mean compiling and deploying your code to the cluster but there is a project called BipTop which helps you do that sort of thing.
CodePudding user response:
If... you aren't going to log a lot and want to log a little ....you can abuse accumulators to do your bidding and log information to the driver. Here's a great explanation/example. It's not made for logging but if you use it really sparingly, like for a handful of items it will do the job. Accumulators are most useful for counting things. They also will log the count at least once. (If a executor dies and re-runs it could count something twice so be wary.) (They're a hold over from mappers/reducers.)
A better abuse of string accumulators: You could use it post where the location of your log file is so you can retrieve the file later.