Home > Software engineering >  Add another column after groupBy and agg
Add another column after groupBy and agg

Time:06-02

I have a df looks like this:

 ----- ------- ----- 
|docId|vocabId|count|
 ----- ------- ----- 
|    3|      3|  600|
|    2|      3|  702|
|    1|      2|  120|
|    2|      5|  200|
|    2|      2|  500|
|    3|      1|  100|
|    3|      5| 2000|
|    3|      4|  122|
|    1|      3| 1200|
|    1|      1| 1000|
 ----- ------- ----- 

I want to output the max count of vocabId and the docId it belongs to. I did this: val wordCounts = docwords.groupBy("vocabId").agg(max($"count") as ("count")) and got this:

 ------- ---------- 
|vocabId|    count |
 ------- ---------- 
|      1|      1000|
|      3|      1200|
|      5|      2000|
|      4|       122|
|      2|       500|
 ------- ---------- 

How do I add the docId at the front??? It should looks something like this(the order is not important):

 ----- ------- ----- 
|docId|vocabId|count|
 ----- ------- ----- 
|    2|      2|  500|
|    3|      5| 2000|
|    3|      4|  122|
|    1|      3| 1200|
|    1|      1| 1000|
 ----- ------- ----- 

CodePudding user response:

You can do self join with docwords over count and vocabId something like below

val wordCounts = docwords.groupBy("vocabId").agg(max($"count") as ("count")).join(docwords,Seq("vocabId","count"))
  • Related