AttributeError: 'RandomForestClassifier' object has no attribute 'estimators

I am trying to store several estimators in a pandas DataFrame, and I keep running into this error:

AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'

Initially, I though this was due to the fact that it was trying to copy the estimator to several rows, however, I was able to replicate the error with the following code:

pd.DataFrame({
    "foo" : "bar",
    "model" : RandomForestClassifier()
})

I also tried saving the estimator class in a dictionary and instantiating it in the dataFrame as seen below:

d = {"rf" : RandomForestClassifier}
pd.DataFrame({
    "foo" : "bar",
    "model" : d["rf"](random_state=100)
})

yet I still get the same error. So I'm thinking, if there is a solution for doing it as a single entry, then I'll be able to sclae that up. Does anyone have any ideas?

CodePudding user response：

The problem is that pandas is trying to explode the values of your dictionary into values for multiple rows, for which it checks the len of each, and RandomForestClassifier defines a __len__ method, as the number of fitted estimators (i.e. len(estimators_)).

In your one-row case, you can just wrap everything as singleton lists:

pd.DataFrame({
    "foo": ["bar"],
    "model": [RandomForestClassifier()],
})

CodePudding user response：

This is really bizzare, it has to have something to do with the way a pandas DF instantiates. As a work around, I dont get the same error when using pd.Series instead...Which you could then turn into a DF

ser = pd.Series({
    "foo" : "bar",
    "model" : RandomForestClassifier(),
})
df = pd.DataFrame(ser)