I'm trying to use a log4j2 RegexFilter to filter the spark warning Truncated the string representation of a plan since it was too long
. Spark logs this warning because I'm setting the config option spark.sql.maxPlanStringLength=0
because I don't want query plan output in the application logs.
Here is my spark app, that triggers the warning:
package sparklog4j2
import org.apache.spark.sql.SparkSession
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.core.config.{LoggerConfig}
import org.apache.logging.log4j.{Logger, LogManager, Level}
object Demo {
def main(args: Array[String]): Unit = {
val ctx: LoggerContext = LogManager.getContext().asInstanceOf[LoggerContext]
val conf = ctx.getConfiguration()
println(s"CONFIG NAME: ${conf.getName}")
val spark = SparkSession.builder().appName("log4j2 demo").getOrCreate()
import spark.implicits._
spark.createDataset[String](Seq("foo","bar")).show
}
}
I build a fat jar with sbt assembly
:
scalaVersion := "2.12.15"
version := "1.0.0"
libraryDependencies = Seq(
"org.apache.logging.log4j" % "log4j-api" % "2.13.2",
"org.apache.logging.log4j" % "log4j-core" % "2.13.2",
"org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.13.2",
"org.apache.logging.log4j" % "log4j-1.2-api" % "2.13.2" % "provided",
"org.apache.spark" %% "spark-core" % "3.2.1" % "provided",
"org.apache.spark" %% "spark-sql" % "3.2.1" % "provided",
)
Here is my log4j2.json which defines the configuration level RegexFilter:
{
"configuration": {
"name": "sparklog4j2-demo",
"RegexFilter": {
"regex": ".*Truncated.*",
"onMatch": "DENY",
"onMismatch": "NEUTRAL"
},
"loggers": {
"logger": [
{
"name": "org.apache.spark.*",
"level": "error",
"includeLocation": true
}
],
"root": {
"level": "error",
"includeLocation": true
}
}
}
}
And here is how I run the app:
spark-submit \
--verbose \
--class sparklog4j2.Demo \
--jars ./jars/log4j-1.2-api-2.13.2.jar \
--driver-java-options "-Dlog4j.configurationFile=files/log4j2.json -Dlog4j2.debug=true -DLog4jDefaultStatusLevel=trace" \
--conf "spark.sql.maxPlanStringLength=0" \
--files ./files/log4j2.json \
target/scala-2.12/log4j-spark-assembly-1.0.0.jar
AS the app is running this linkage error is emitted:
INFO StatusLogger Plugin [org.apache.hadoop.hive.ql.log.HiveEventCounter] could not be loaded due to linkage error.
java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/appender/AbstractAppender
Even though I've packaged log4j-core - this suggests it's missing.
However the app runs fine, I see CONFIG NAME: sparklog4j2-demo
which proves the app has loaded my log4j2.json config.
Yet spark emits this:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
...
WARN StringUtils: Truncated the string representation of a plan since it was too long.
So my filter is not working, and it appears that Spark is not even using my log4j config.
CodePudding user response:
Instead of using a RegexFilter, I'm able to stop this warning by raising the priority threshold of the relevant loggers: I added these lines to spark-3.2.1-bin-hadoop3.2/conf/log4j.properties.template
:
log4j.logger.org.apache.spark.sql.catalyst.util.StringUtils=ERROR
log4j.logger.org.apache.spark.sql.catalyst.util=ERROR
And modified my submit command to load the properties file:
spark-submit \
--class sparklog4j2.Demo \
--jars ./jars/log4j-1.2-api-2.13.2.jar \
--driver-java-options "-Dlog4j.configuration=File:$HOME/lib/spark-3.2.1-bin-hadoop3.2/conf/log4j.properties.template" \
--conf "spark.sql.maxPlanStringLength=0" \
target/scala-2.12/log4j-spark-assembly-1.0.0.jar
This doesn't resolve the issues I was having with log4j 2 but it does stop the warning.