How to keep one specific row while dropping all other unnecessary rows in scala spark dataframe?-CodePudding

Just as the title, in scala spark, if I have a dataframe like below:

 ------------- ---------- 
| key         | Time     |
 ------------- ---------- 
|  1          |        1 |  
|  1          |        2 | 
|  1          |        4 |
|  2          |        2 | 
|  2          |        3 | 
 ------------- ----------

For same key, how can I only keep the key with the least time and drop all other unnecessary row? In this case, for key 1, it has 3 rows with different time, the least time is 1, so I only want to keep the key 1, time 1 row and drop other 2 rows for key 1. Same with key 2, I only want to keep key 2, time 2, so I drop the key 2, time 3 row. The format of key is LongType and the format of time is StringType. If there is some way to achieve this?

 ------------- ---------- 
| key         | Time     |
 ------------- ---------- 
|  1          |        1 |  
|  2          |        2 | 
 ------------- ----------

I tried to use drop or filter function, but I don't think they works.

CodePudding user response：

Try something similar to this, my own data here of course:

%scala

import spark.implicits._
import org.apache.spark.sql.functions._

val df = sc.parallelize(Seq( (1,7), (1,8), (1,9), (2,2), (2,99) )).toDF("i", "c")
df.groupBy($"i").agg(min($"c")).show()

returns:

 --- ------ 
|  i|min(c)|
 --- ------ 
|  1|     7|
|  2|     2|
 --- ------