Home > OS >  Stop spark log warning "Truncated the string representation of a plan ...."
Stop spark log warning "Truncated the string representation of a plan ...."

Time:11-15

I'm trying to use a log4j2 RegexFilter to filter the spark warning Truncated the string representation of a plan since it was too long. Spark logs this warning because I'm setting the config option spark.sql.maxPlanStringLength=0 because I don't want query plan output in the application logs.

Here is my spark app, that triggers the warning:

package sparklog4j2

import org.apache.spark.sql.SparkSession
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.core.config.{LoggerConfig}
import org.apache.logging.log4j.{Logger, LogManager, Level}

object Demo {
  def main(args: Array[String]): Unit = {
    val ctx: LoggerContext = LogManager.getContext().asInstanceOf[LoggerContext]
    val conf = ctx.getConfiguration()
    println(s"CONFIG NAME: ${conf.getName}")
    val spark = SparkSession.builder().appName("log4j2 demo").getOrCreate()
    import spark.implicits._
    spark.createDataset[String](Seq("foo","bar")).show
  }
}

I build a fat jar with sbt assembly:

scalaVersion := "2.12.15"

version := "1.0.0"

libraryDependencies   = Seq(
  "org.apache.logging.log4j" % "log4j-api" % "2.13.2",
  "org.apache.logging.log4j" % "log4j-core" % "2.13.2",
  "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.13.2",
  "org.apache.logging.log4j" % "log4j-1.2-api" % "2.13.2" % "provided",
  "org.apache.spark" %% "spark-core" % "3.2.1" % "provided",
  "org.apache.spark" %% "spark-sql" % "3.2.1" % "provided",
)

Here is my log4j2.json which defines the configuration level RegexFilter:

{
  "configuration": {
    "name": "sparklog4j2-demo",
    "RegexFilter": {
      "regex": ".*Truncated.*",
      "onMatch": "DENY",
      "onMismatch": "NEUTRAL"
    },
    "loggers": {
      "logger": [
        {
          "name": "org.apache.spark.*",
          "level": "error",
          "includeLocation": true
        }
      ],
      "root": {
        "level": "error",
        "includeLocation": true
      }
    }
  }
}

And here is how I run the app:

spark-submit \
--verbose \
--class sparklog4j2.Demo \
--jars ./jars/log4j-1.2-api-2.13.2.jar \
--driver-java-options "-Dlog4j.configurationFile=files/log4j2.json -Dlog4j2.debug=true -DLog4jDefaultStatusLevel=trace" \
--conf "spark.sql.maxPlanStringLength=0" \
--files ./files/log4j2.json \
target/scala-2.12/log4j-spark-assembly-1.0.0.jar

AS the app is running this linkage error is emitted:

INFO StatusLogger Plugin [org.apache.hadoop.hive.ql.log.HiveEventCounter] could not be loaded due to linkage error.
java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/appender/AbstractAppender

Even though I've packaged log4j-core - this suggests it's missing.

However the app runs fine, I see CONFIG NAME: sparklog4j2-demo which proves the app has loaded my log4j2.json config.

Yet spark emits this:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
...
WARN StringUtils: Truncated the string representation of a plan since it was too long.

So my filter is not working, and it appears that Spark is not even using my log4j config.

CodePudding user response:

Instead of using a RegexFilter, I'm able to stop this warning by raising the priority threshold of the relevant loggers: I added these lines to spark-3.2.1-bin-hadoop3.2/conf/log4j.properties.template:

log4j.logger.org.apache.spark.sql.catalyst.util.StringUtils=ERROR
log4j.logger.org.apache.spark.sql.catalyst.util=ERROR

And modified my submit command to load the properties file:

spark-submit \
--class sparklog4j2.Demo \
--jars ./jars/log4j-1.2-api-2.13.2.jar \
--driver-java-options "-Dlog4j.configuration=File:$HOME/lib/spark-3.2.1-bin-hadoop3.2/conf/log4j.properties.template" \
--conf "spark.sql.maxPlanStringLength=0" \
target/scala-2.12/log4j-spark-assembly-1.0.0.jar

This doesn't resolve the issues I was having with log4j 2 but it does stop the warning.

  • Related