Spark MLlib in collaborative filtering algorithm is type int userId, but the real ID is a string, ho-CodePudding

MLlib Rating is acceptable for the cf algorithm type of RDD:

JavaRDD

The Rating represent two int and a double

Org. Apache. Spark. Mllib. Recommendation. Rating. The Rating (int user, int product, double Rating)

But if I'm the only user identifier is uuid, so how to convert the only correspondence of type int? Get a direct mapping table to 1234 and match them with the uuid?

CodePudding user response:

Well, don't we come across this problem?

CodePudding user response:

Why use uuid role household unique identification? It is good to use int the increase is not directly?

CodePudding user response:

The building Lord, how do you solve problems? Can only get a map to 1234 and match them with the uuid

CodePudding user response:

Have a look at StringIndexer usage

CodePudding user response:

Could you tell me how to solve the original huh? I also met this id is a series of Numbers and types of the characters

CodePudding user response:

Yes, get a primary key id int directly from growth table, uuid corresponds with users, attention to the heavy and one to one correspondence, deal with the raw data with SQL, replace with id uuid deposited in the algorithm reads training data file, calculate the result will be back to uuid id again

CodePudding user response:

Could you tell me how to solve the original huh? I also met this id is a series of Numbers and types of the characters