I am trying to store several estimators in a pandas DataFrame, and I keep running into this error:
AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'
Initially, I though this was due to the fact that it was trying to copy the estimator to several rows, however, I was able to replicate the error with the following code:
pd.DataFrame({
"foo" : "bar",
"model" : RandomForestClassifier()
})
I also tried saving the estimator class in a dictionary and instantiating it in the dataFrame as seen below:
d = {"rf" : RandomForestClassifier}
pd.DataFrame({
"foo" : "bar",
"model" : d["rf"](random_state=100)
})
yet I still get the same error. So I'm thinking, if there is a solution for doing it as a single entry, then I'll be able to sclae that up. Does anyone have any ideas?
CodePudding user response:
The problem is that pandas is trying to explode the values of your dictionary into values for multiple rows, for which it checks the len
of each, and RandomForestClassifier
defines a __len__
method, as the number of fitted estimators (i.e. len(estimators_)
).
In your one-row case, you can just wrap everything as singleton lists:
pd.DataFrame({
"foo": ["bar"],
"model": [RandomForestClassifier()],
})
CodePudding user response:
This is really bizzare, it has to have something to do with the way a pandas DF instantiates. As a work around, I dont get the same error when using pd.Series instead...Which you could then turn into a DF
ser = pd.Series({
"foo" : "bar",
"model" : RandomForestClassifier(),
})
df = pd.DataFrame(ser)