Home > other >  Never put off till tomorrow what you can help: NLP model assessment
Never put off till tomorrow what you can help: NLP model assessment

Time:09-27

For help, I have already passed the word2vec library training my model, the teacher asked me to evaluate the accuracy of the model, the recall rate

Training process is as follows:
 
[color=# FF6600] import pickled
The import jieba. Analyse
The from gensim. Test. Utils import common_texts, get_tmpfile
The from gensim. Models import word2vec
With the open (' F: \ CSDN \ CSDN. TXT, encoding='GBK') as F:
The document=f.r ead ()
Document_cut=jieba. The cut (document)
Result="'. Join (document_cut)
With the open (' F: \ CSDN \ mi. TXT ', 'w', encoding="utf-8") as f2:
F2. Write (result)
Sentences=word2vec. LineSentence (' F: \ CSDN \ mi. TXT)
Path=get_tmpfile (" CSDN. Model ") # to create temporary file
Model=word2vec. Word2vec (sentences, sg=1, the hs=1, min_count=10, sample=1-3, e window=10, size=200, alpha=0.025, seed=1)
With the open (' SCDNCSDN. Model ', 'wb) as f:
Pickle. Dump (model, f)
[/color]


The model
I want to use sklearn to evaluate the recall rate and accuracy, but don't know how to write y_pred and y_true? I this is no tag data,
For elder people to help, tell me what to do
Baidu online this assessment method, I don't know how to change my model...
 
The import pickled
The from sklearn. Neighbors import KNeighborsClassifier
The from sklearn import datasets
The from sklearn. Model_selection import train_test_split
Pickle_in=open (' SCDNCSDN. Model ', 'rb')
Model=pickle. The load (pickle_in)
Iris=datasets. Load_iris ()
Iris_X=iris. Data
Iris_y=iris. The target
X_train X_test, y_train, y_test=train_test_split (iris_X iris_y, test_size=0.9)
KNN=KNeighborsClassifier ()
KNN. Fit (X_train y_train)
Params=KNN. Get_params ()
Print (params)
Y_predict=KNN. Predict (X_test)
Score=KNN. Score (X_test y_test)
Print (" prediction score: % s "% score)

CodePudding user response:

This without the label data for our common understanding of accuracy and recall rate is impossible, I think the teacher should be other meanings, his meaning is to want you to give him your the effect of the model, then you can have two ways to complete the requirements completely
1: using tensorboard or other tools to draw a scatter diagram, see if close to the place that the same type of
2: word2vec trained is a vector of each word, you can make a tool to input a word, according to the similarity of the vector similar printed top 10
Use this can complete the task
  • Related