Home > other >  Pay_time spark group after the smallest pay_amount record
Pay_time spark group after the smallest pay_amount record

Time:09-17

There is a dataframe
[user | pay_time | pay_amount]

According to the user group (pay_time) how to obtain payment time after the smallest that record the payment amount of pay_amount
Df. Groupby (" user "). Agg (
F. may (" pay_time "). In alias (" first_pay_time "), the first time filling #
F.s um (" pay_amount "). Alias (" tot_pay_amount "), # accumulated top-up amount
# how to get the first filling amount
)


The purpose is to analysis the player's first filling time and filling amount

CodePudding user response:

One of two ways:
1: val first_TimeDF=df. Groupby (" user "). The agg (f. may (" pay_time "). In alias (" first_pay_time "))
Val full_InfoDF=first_TimeDF. Join (df, Seq (" user ", "pay_time"), "left")
This way is a clear need for aggregation and minimum first lost first filling (amount), need to join again back to the original data sets,
2: val win=Window. PartitionBy (" user "). The orderBy (" pay_time ")
Val firstDF=df. WithColumn (" rownum ", row_number () over (win)). The filter (" rownum=1 "). The drop (" rownum ")
This way using the window function, through the user groups, pay_time ascending order, obtain information on the serial number is 1, each user, the first time, the first filling amount,
  • Related