I want to parametrice when i have a header and then separator when I read a csv from Spark. I've written this:
DataFrameReader dataFrameReader = spark.read();
dataFrameReader = "csv".equalsIgnoreCase(params.getReadFileType()) ?
dataFrameReader
.option("sep",params.getDelimiter())
.option("header",params.isHeader())
:dataFrameReader;
I'm new to Groovy and I don't get dataFrameReader.option corrected mocked.
DataFrameReader dfReaderLoader = Mock(DataFrameReader)
DataFrameReader dfReaderOptionString = Mock(DataFrameReader)
DataFrameReader dfReaderOptionBoolean = Mock(DataFrameReader)
SparkSession sparkSession = Mock(SparkSession)
sparkSession.read() >> dfReaderLoader
dfReaderLoader.option(_ as String, _ as String) >> dfReaderOptionString
dfReaderOptionString.option(_ as String, _ as Boolean) >> dfReaderOptionBoolean
And it gives me a null pointer exception.
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.DataFrameReader.option(String, boolean)" because the return value of "org.apache.spark.sql.DataFrameReader.option(String, String)" is null
CodePudding user response:
If you don't really care about the intermediate invocations of a builder pattern, i.e. an object that returns itself. I'd suggest to use a Stub
, which will return itself, if the method return type matches it's type, or you can use this declaration _ >> _
to achieve the same for Mocks
.
given:
ThingBuilder builder = Mock() {
_ >> _
}
when:
Thing thing = builder
.id("id-42")
.name("spock")
.weight(100)
.build()
then:
1 * builder.build() >> new Thing(id: 'id-1337') // <-- only assert the last call you actually care about
thing.id == 'id-1337'
Try it in the Groovy Web Console.
That being said, the error would probably go away if you just remove the as String
cast of the second argument of option
, or fix it to be as Boolean
as the error suggests.
CodePudding user response:
I do not know what your problem is, but my guess is that you create the mocks, but then do not inject them into your class under test. If you do that, both your own version as well as Leonard's suggested improved version with a default response work:
Class under test helper class:
class UnderTest {
SparkSession spark
Parameters params
DataFrameReader produce() {
DataFrameReader dataFrameReader = spark.read()
dataFrameReader = "csv".equalsIgnoreCase(params.getReadFileType()) ?
dataFrameReader
.option("sep", params.getDelimiter())
.option("header", params.isHeader())
: dataFrameReader
}
}
class Parameters {
String readFileType
String delimiter
boolean header
}
Spock specification:
package de.scrum_master.stackoverflow.q74923254
import org.apache.spark.sql.DataFrameReader
import org.apache.spark.sql.SparkSession
import org.spockframework.mock.MockUtil
import spock.lang.Specification
class DataFrameReaderTest extends Specification {
def 'read #readFileType data'() {
given:
DataFrameReader dfReaderLoader = Mock(DataFrameReader)
DataFrameReader dfReaderOptionString = Mock(DataFrameReader)
DataFrameReader dfReaderOptionBoolean = Mock(DataFrameReader)
SparkSession sparkSession = Mock(SparkSession)
sparkSession.read() >> dfReaderLoader
dfReaderLoader.option(_ as String, _ as String) >> dfReaderOptionString
dfReaderOptionString.option(_ as String, _ as Boolean) >> dfReaderOptionBoolean
def underTest = new UnderTest(spark: sparkSession, params: parameters)
expect:
underTest.produce().toString().contains(returnedMockName)
where:
readFileType | parameters | returnedMockName
'CSV' | new Parameters(readFileType: readFileType, delimiter: ';', header: true) | 'dfReaderOptionBoolean'
'XLS' | new Parameters(readFileType: readFileType) | 'dfReaderLoader'
}
def 'read #readFileType data (improved)'() {
given:
SparkSession sparkSession = Mock() {
read() >> Mock(DataFrameReader) {
_ >> _
}
}
def parameters = new Parameters(readFileType: readFileType, delimiter: ';', header: true)
def underTest = new UnderTest(spark: sparkSession, params: parameters)
expect:
new MockUtil().isMock(underTest.produce())
where:
readFileType << ['CSV', 'XLS']
}
}
Try it in the Groovy Web Console.
The result should look similar to this in your IDE:
DataFrameReaderTest ✔
├─ read #readFileType data ✔
│ ├─ read CSV data ✔
│ └─ read XLS data ✔
└─ read #readFileType data (improved) ✔
├─ read CSV data (improved) ✔
└─ read XLS data (improved) ✔