Suppose I have this data
ColumnA | ColumnB |
---|---|
row1 | valueA |
row1 | valueB |
row2 | valueB |
How can I join the value of Column B that has the same value in Column A? Example:
ColumnA | ColumnB |
---|---|
row1 | valueA, valueB |
row2 | valueB |
CodePudding user response:
You can use collect_set
and concat_ws
.
df.select("ColumnA","ColumnB")
.groupBy("ColumnA")
.agg(concat_ws(",",collect_set("ColumnB")))