Home > Software design >  Sklearn predict using a subset of my data
Sklearn predict using a subset of my data

Time:09-13

Is there a way to use predict on a selection of rows from a pandas dataset? As an example:

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()
clf.fit(X, y)

selection = [True, True, False, False, True, False]
data = pd.DataFrame.from_dict(
    {
        "A": [1, 5, 3, 6, 5, 7],
        "B": ["a", "b", "a", "a", "b", "b"],
        "c": [5, 7, 4, 6, 5, 2],
    }
)

clf.predict(data[selection])

The idea is to use the predict method of the classifier only on the rows where selection is True while retaining the rows where selection is False as NaN. In this case the output should be something like:

[1, 0, NaN, NaN, 1, NaN]

Using clf.predict(data[selection]) I obviously get the results of the classifier but I lose the order of the original dataframe.

CodePudding user response:

You can try something like this:

data["selection"] = selection

selected_cols = data.columns[:-1]
def predict(x):
    if x.selection:
        return ("model.predict(x[selected_cols])") # call your model here 
    else:
        return np.NAN

data.apply(predict, axis=1)

0    model.predict()
1    model.predict()
2                NaN
3                NaN
4    model.predict()
5                NaN
dtype: object
  • Related