Home > database >  To deal with the data stored in the hive
To deal with the data stored in the hive

Time:10-31

The problems in the study, the left side for the member number, on the right is the corresponding purchase of goods number, I already according to the date of partition is stored in the hive, next want to find out a period of time which two pieces of goods of the highest correlation, that is to put on the right side of the inside array array of goods number one by one into a key/value pair and remember as a, and then according to the partition statistics out every day, then the statistics over a period of time which two pieces of goods of the highest correlation, bosses give a train of thought, how should do, with what method or what operator can be used to solve the

CodePudding user response:

Through mathematical modeling thought, this is a relevant question, can there are many ways of solving: Euclidean distance, the nearby, and so on can be solved
  • Related