Home > Enterprise >  AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'areaUnde
AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'areaUnde

Time:10-18

I want to run area under ROC test for my machine learning model, but the attribute error pops up. Below is my complete code with the error details include.

from pyspark.ml.classification import LogisticRegression

data = pp_df.select(
    F.col("VectorAssembler_features").alias("features"),
    F.col("HSCode").alias("label"),
)

model = LogisticRegression().fit(data)

model_summary.areaUnderROC

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\AZMANM~1\AppData\Local\Temp/ipykernel_4856/3039136250.py in <module>
----> 1 model_summary.areaUnderROC
AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'areaUnderROC'

model.summary.pr.show()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\AZMANM~1\AppData\Local\Temp/ipykernel_4856/3388404637.py in <module>
----> 1 model.summary.pr.show()

AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'pr'

CodePudding user response:

You will need to use BinaryClassificationEvaluator.

 lr = LogisticRegression(labelCol="label", featuresCol="features",maxIter=10  #,regParam=0.05
                            ,threshold=0.5)
    lr_model=lr.fit(train_set)
    predict_train=lr_model.transform(train_set)
    predict_test=lr_model.transform(test_set)
    
evaluator = BinaryClassificationEvaluator()
print("Test Area Under ROC: "   str(evaluator.evaluate(predict_test, {evaluator.metricName: "areaUnderROC"})))

CodePudding user response:

There's no code that tells us how you are getting the model_summary variable.

Did you maybe forget to use model.summary.areaUnderROC instead of model_summary.areaUnderROC?

The following example works for me:

from pyspark.sql import Row, SparkSession
from pyspark.ml.linalg import Vectors
from pyspark.ml.classification import LogisticRegression

if __name__ == "__main__":

    spark = SparkSession.builder.getOrCreate()
    sc = spark.sparkContext
    bdf = sc.parallelize(
        [
            Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)),
            Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)),
            Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)),
            Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0)),
        ]
    ).toDF()
    blor = LogisticRegression(weightCol="weight")
    blorModel = blor.fit(bdf)
    summary = blorModel.summary
    aur = summary.areaUnderROC
  • Related