Scala Test: how to assert lenghty exception message securly and clean without hardcoding?-CodePudding

I have the following code, which is used to (sha) hash columns in a spark dataframe:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}

object hashing {

def process(hashFieldNames: List[String])(df: DataFrame) = {
   hashFieldNames.foldLeft(df) { case (df, hashField) =>
   df.withColumn(hashField, sha2(col(hashField), 256))
  }
 }
}

Now in a seperate file, I am testing my hashing.process using a AnyWordSpec Test as follows:

"The hashing .process " should {
// some cases here that complete succesfully 
"fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  val thrown = intercept[org.apache.spark.sql.AnalysisException] {
    val hashedResultDf =
      hashing.process(hashFieldNames)(badDf) 
      
  }
  assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety.

Usually, as I understand, one would want to hard code the whole error message to ensure that it is indeed as we expect. However, the message is very lengthy and I am wondering if there is no better approach.

Basically, I have two questions:

a.) Is it considered good practice to match only the beginning part of error message and then follow up with a regex ? I am thinking something like this: thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" regexpattern \;(.*))

b.) If a.) is considered a hacky approach, do you have any working suggestion on how to do it properly ?

Note: Small errors possible with code above, I adapted it for SO post. But you should get the idea.

CodePudding user response：

Ok, answering my own question. I now solved it like this:

  "fail to hash a spark dataframe due to type mismatch " in {
  val goodColumns = Seq("language", "usersCount", "ID", "personalData")
  val badDataSample =
    Seq(
      ("Java", "20000", 2, "happy"),
      ("Python", "100000", 3, "happy"),
      ("Scala", "3000", 1, "jolly")
    )
  
  val badDf =
    spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)

  //val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
  val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
      IngestionHashing.process(hashFieldNames)(badDf)  
      
  }
 thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}

Leaving this post open for potential suggestions / improvement. According to https://github.com/databricks/scala-style-guide#intercepting-exceptions the solution is still not ideal.

CodePudding user response：

You should not be asserting exception messages (unless they are surfced to the user, or something downndstream relies on them). If throwing an exception is a part of contract, then you should be throwing one of a specific type with a given error code, and tests should be asserting that. And if it isn't, then who cares what the message said?