Is there a way to use predict on a selection of rows from a pandas dataset? As an example:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X, y)
selection = [True, True, False, False, True, False]
data = pd.DataFrame.from_dict(
{
"A": [1, 5, 3, 6, 5, 7],
"B": ["a", "b", "a", "a", "b", "b"],
"c": [5, 7, 4, 6, 5, 2],
}
)
clf.predict(data[selection])
The idea is to use the predict method of the classifier only on the rows where selection
is True
while retaining the rows where selection
is False
as NaN
. In this case the output should be something like:
[1, 0, NaN, NaN, 1, NaN]
Using clf.predict(data[selection])
I obviously get the results of the classifier but I lose the order of the original dataframe.
CodePudding user response:
You can try something like this:
data["selection"] = selection
selected_cols = data.columns[:-1]
def predict(x):
if x.selection:
return ("model.predict(x[selected_cols])") # call your model here
else:
return np.NAN
data.apply(predict, axis=1)
0 model.predict()
1 model.predict()
2 NaN
3 NaN
4 model.predict()
5 NaN
dtype: object