I'm using Pydantic model (Basemodel) with FastAPI and converting the input into a dictionary, and then converting it into a Pandas dataframe to assign it into "model.predict" for Machine Learning Prediction as bellow :
from fastapi import FastAPI
import uvicorn
from pydantic import BaseModel
import pandas as pd
from typing import List
class Inputs(BaseModel):
f1: float,
f2: float,
f3: str
@app.post('/predict')
def predict(features: List[Inputs]):
output = []
# loop the list of input features
for data in features:
result = {}
# Convert data into dict() and then into a DataFrame
data = data.dict()
df = pd.DataFrame([data])
# get predictions
prediction = classifier.predict(df)[0]
# get probability
probability = classifier.predict_proba(df).max()
# assign to dictionary
result["prediction"] = prediction
result["probability"] = probability
# append dictionary to list (many outputs)
output.append(result)
return output
It works fine, I'm just not quite sure if it's optimized or the right way to do it, since I convert the input two times to get the predictions, and I'm not sure if it's gonna work fast in case having a huge number of inputs. Any improvements for this ?! If there's a way (even other than using (Pydantic models) where I can work directly and avoid going through conversions and the loop
CodePudding user response:
First, you should use more descriptive names for your variables/objects. For example:
@app.post('/predict')
def predict(inputs: List[Inputs]):
for input in inputs:
# ...
You cannot pass the Pydantic model directly to the predict()
function, as it accepts a data array
, not a Pydantic model. Available options are listed below.
Option 1
You could use:
prediction = model.predict([[input.f1, input.f2, input.f3]])[0]
Option 2
If you don't wish to use a Pandas DataFrame, as shown in your question, i.e.,
df = pd.DataFrame([input.dict()])
prediction = model.predict(df)[0]
then, you could use the __dict__
method to get the values of all attributes in the model and convert it to a list
:
prediction = model.predict([list(input.__dict__.values())])[0]
or, preferably, use the Pydantic's .dict()
method:
prediction = model.predict([list(input.dict().values())])[0]
Option 3
You could avoid looping over individual items and calling the predict()
function multiple times, by using, instead, the below:
import pandas as pd
df = pd.DataFrame([i.dict() for i in inputs])
prediction = model.predict(df)
probability = model.predict_proba(df)
return {'prediction': prediction.tolist(), 'probability': probability.tolist()}
or (in case you don't wish using Pandas DataFrame):
inputs_list = [list(i.dict().values()) for i in inputs]
prediction = model.predict(inputs_list)
probability = model.predict_proba(inputs_list)
return {'prediction': prediction.tolist(), 'probability': probability.tolist()}