I normally use Python for scripts but I am trying to write a unit test and am having a lot of issues. I would like to test a method that creates a parameter --users
. The value is how many occurred.
count_users(df, args.metrics)
It is a spark dataframe and the metrics are set like so:
if __name__ == "__main__":
parser = argparse.ArgumentParser("Processing args")
parser.add_argument("--metrics", required=True)
main(parser.parse_args())
The method looks like this:
def count_users(df, metrics):
users = df.where(df.users > 0).count()
temp_df = df.withColumn("user_count_values", F.lit(users))
temp_df.write.json(metrics)
Now I am trying to write my test, and this is where I am not sure about:
def test_count_users(self):
df = (
SparkSession.builder.appName("test")
.getOrCreate()
.createDataFrame(
data=[
(Decimal(0),),
(Decimal(22),),
],
schema=StructType(
[
StructField("users", DecimalType(38, 4), True),
]
),
)
)
ap = argparse.ArgumentParser("Test args")
ap.add_argument("metrics")
args = {_.dest: _ for _ in ap._actions if isinstance(_, _StoreAction)}
assert args.keys() == {"metrics"}
count_users(df, args.metrics)
self.assertTrue(args["metrics"], 1)
Right now I get an error that reads
count_users(df, args.metrics)
AttributeError: 'dict' object has no attribute 'metrics'
CodePudding user response:
It's unclear what you are trying to achieve with the args = { ...
line, or the two asserts. Remove them, use something standard like
import argparse
parser = argparse.ArgumentParser("Test args")
parser.add_argument("--metrics", required=True)
args = parser.parse_args(["--metrics", "output.json"])
count_users(df, args.metrics)
Your args
variable won't have the appropriate attribute(s) until you parse the arguments. Of course, normally you'd call
args = parser.parse_args()
and let the user provide the --metrics outputfilename.json
arguments to the script. The above is more for example or test use cases.