I have this dataframe and I want to create another one like the expected one. The thing is that I want to take the values of col_2 of the same group 'a' of col_1 and put them in the same row in distinct columns. Is there a way to do it?
# ----- ----- ----- -
#|col_1| id |col_2|
# ----- ----- -----
#| a| 1| c|
#| a| 2| f|
#| a| 3| i|
# ----- ----- -----
Expected
# ----- ----- ----- -------
#|col_1|col_c|col_f| col_i |
# ----- ----- ----- -------
#| a| c | f | i |
----- ----- ----- -------
CodePudding user response:
Assume your dataset is called main
. We can use the following query to extract the values as columns:
var created = main.groupBy("col_1").pivot("col_2").agg(first(col("col_2")))
This gives this output (which is almost as you like):
----- --- --- ---
|col_1| c| f| i|
----- --- --- ---
| a| c| f| i|
----- --- --- ---
Now, we find the columns that do not start with col_
and we rename them in the dataset:
val columns = created.columns.filterNot(c => c.startsWith("col_"))
for (i <- columns) {
created = created.withColumnRenamed(i, "col_" i)
}
Final output:
----- ----- ----- -----
|col_1|col_c|col_f|col_i|
----- ----- ----- -----
| a| c| f| i|
----- ----- ----- -----