For how to use the spark grammar repeating field from a 3 d data-CodePudding

There is such a data: 5 users, each user to record the location of the 10 point in time, for a single user demand that the adjacent time point to deal with the weight of the same field, retain only the first line of duplicate data, continuously implement adjacent time point location is different, different users can be repeated, between

Current thinking is that the data design: 2 d 3 d can
Two-dimensional data: the user's location time
The user position 1 time 1
User 1 position 1 time 2
User 1 2 position 3 time, etc.
3 d: is the two-dimensional table into different user

If is to use 2 d data, how can I achieve continuous repeating field data processing to heavy, with a focus on the deal only with continuous, can only use the spark processing,

Pray god to help show the following, thank you very much,

CodePudding user response:

Use the row_number () function, according to the user, location to group, in accordance with the time to sort, to generate the same user,
The same position under the corresponding order number, and then carried out in accordance with the order number to brush, row_number is Oracle
On the function, a lot of online data, it is ok to look and see

CodePudding user response:

reference 1st floor qiongwei response:

use row_number (), function, according to the user, position to group, in accordance with the time to sort, to generate the same user,
The same position under the corresponding order number, and then carried out in accordance with the order number to brush, row_number is Oracle
On the function, a lot of online information, it is ok to look and see

Thank you very much, I went to check the data