Home > Back-end >  ndarray is not C-contiguous using scikit-learn knn.predict
ndarray is not C-contiguous using scikit-learn knn.predict

Time:07-04

I am triying to call predict function in order to this I have the following code

def convert_to_df(obj):
    obj_dic = obj.dict()
    df = pd.DataFrame(obj_dic.values(), index=obj_dic.keys())
    df.reset_index(drop=True, inplace=True)
    return df


@app.get("/get_rating")
def get_rating(features: Features):
    features = convert_to_df(features).T # shape (1, 26)
    return {'rating': Predictor().predict(features)}

but I am getting the following error:

File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper File "stringsource", line 349, in View.MemoryView.memoryview.cinit ValueError: ndarray is not C-contiguous

How can I solve this?

Thanks

EDIT

Predictor is a knn model trainer with scikit learn

def predict(self, features) -> int:
    return self.model.predict(features)

CodePudding user response:

The amount of data and traceback information is still not enough. But I'll make a guess.

Let's make a simple dataframe:

In [31]: df = pd.DataFrame(np.ones((3,4)))

In [32]: df
Out[32]: 
     0    1    2    3
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0

predict probably uses some compiled code that expect c-contiguous data. If given a dataframe it probably first converts it to an array, such as with np.array(df) or effectively:

In [35]: df.values
Out[35]: 
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [36]: df.values.flags
Out[36]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

But if you do a transpose the contiguity changes. This is well known for arrays, and it looks like pandas is compatible:

In [37]: df.T.values.flags
Out[37]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

pandas transpose allows us to specify copy - see its docs:

In [38]: df.transpose(copy=True).values.flags
Out[38]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

So using in your code might(???) solve the problem:

 features = convert_to_df(features).transpose(copy=True)

I can't stress enough that you should include enough information in your question.

  • Related