I currently have a Machine Learning model which would predict what part of speech does a current word belong to
penn_results = penn_crf.predict_single(features)
and then, I made a code wherein it makes a print making a (word, POS) style print;
penn_tups = [(sent.split()[idx], penn_results[idx]) for idx in range(len(sent.split()))]
and when I try to run this, it gives me this output.
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN')] [('The', 'DET'), ('quick', 'NOUN'), ('brown', 'ADJ'), ('fox', 'NOUN'), ('jumps', 'NOUN'), ('over', 'ADP')]
and so I saved this model using
penn_filename = 'ptcp.sav'
pickle.dump(penn_crf, open(penn_filename, 'wb'))
Upon trying to run the model by loading hte saved pickle file with this
test = "The quick brown fox jumps over the head"
pickled_model = pickle.load(open('penn_treebank_crf_postagger.sav', 'rb'))
pickled_model.predict(test)
print(pickled_model.predict(test))
It prints this [['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP'], ['NNP']]
How can I make it print the accurate predicted values like this [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN')] [('The', 'DET'), ('quick', 'NOUN'), ('brown', 'ADJ'), ('fox', 'NOUN'), ('jumps', 'NOUN'), ('over', 'ADP')]
CodePudding user response:
Caution: this code was not tested.
Replace the last line
print(pickled_model.predict(test))
with something like this:
tokens_test = test.split()
predictions_test = pickled_model.predict(test)
pairs_test = [(tokens_test[idx], predictions_test[idx]) for idx in range(len(tokens_test))]
print(pairs_test)