Home > other >  With MLib TF - how to get IDF from SparseVector - after the IDF IDF values corresponding to the word
With MLib TF - how to get IDF from SparseVector - after the IDF IDF values corresponding to the word

Time:09-20

Recently calculated using spark MLIb TF - IDF, according to the official website of the sample code:

The import org. Apache. Spark. RDD. RDD
The import org. Apache. Spark. SparkContext
The import org. Apache. Spark. Mllib. Feature. HashingTF
The import org. Apache. Spark. Mllib. Linalg. Vector

Val: sc SparkContext=...

//Load the documents (one per line).
Val documents: RDD [Seq [String]]=sc. TextFile ("... "). The map (_. The split (" "). ToSeq)

Val hashingTF=new hashingTF ()
Val: tf RDD (Vector)=hashingTF. Transform (documents)

The import org. Apache. Spark. Mllib. Feature. The IDF

//... The continue from the previous example
Tf. The cache ()
Val idf=new idf (.) fit (tf)
Val tfidf: RDD (Vector)=idf. Transform (tf)

Finally get is the RDD Vector, the Vector is an abstract class, general returns its subclasses SparseVector here, contains three domains: the size, indices, values, and an array of values is a Double type, tf - idf values of each word is in the document, however, when I want to remove this value corresponds to the word, only to find that do not know how to start, do not know to find the corresponding words, have a great god know?

CodePudding user response:

Good less people here, you can @ cloud881001 ask

CodePudding user response:

Hello, your problem solved? To ask how to solve the corresponding to the word

CodePudding user response:

Hi, I solved? Can be said about the solution?

CodePudding user response:

Hello,
http://stackoverflow.com/questions/35205865/what-is-the-difference-between-hashingtf-and-countvectorizer-in-spark

HashingTF irreversible, CountVectorizer I also didn't find how to reverse, don't know you solve have no?


Rube. Q

CodePudding user response:

JavaRDD Idfvector=idfModel. Transform (tagVectorTF);
Idfvector. Foreach (new VoidFunction () {

/* *
*
*/
private static final long serialVersionUID=1L;

@ Override
Public void call Vector (t) throws the Exception {
SparseVector ss=(SparseVector) t;
Double [] aa=ss. The values ();
System. The out. Println (" idf - "+ t +" - st - "+ aa [2]).

}
});

Java write strong turn next
  • Related