I have working with following code.
df = spark.createDataFrame(data=[["john", "tomato", 1.99, 1],["john", "carrot", 0.45, 1],["bill", "apple", 0.99, 1],["john", "banana", 1.29, 1], ["bill", "taco", 2.59, 1]], schema = ["name", "food", "price", "col_1"])
---- ------ ----- -----
|name| food|price|col_1|
---- ------ ----- -----
|john|tomato| 1.99| 1|
|john|carrot| 0.45| 1|
|bill| apple| 0.99| 1|
|john|banana| 1.29| 1|
|bill| taco| 2.59| 1|
---- ------ ----- -----
I am using collect_list with multiple columns with below code.
df.groupBy('name').agg(collect_list(concat_ws(', ','food','price')).alias('sample')).show(10,False)
I getting below output.
---- ------------------------------------------
|name|sample |
---- ------------------------------------------
|john|[tomato, 1.99, carrot, 0.45, banana, 1.29]|
|bill|[apple, 0.99, taco, 2.59] |
---- ------------------------------------------
But i need to get the below output.
---- ------------------------------------------
|name|sample |
---- ------------------------------------------
|john|tomato 1.99, carrot 0.45, banana 1.29 |
|bill|apple 0.99, taco 2.59 |
---- ------------------------------------------
Is there any way to get the above output it will be helpful. I know above code won't get above output.
Could anyone provide the solution which i am expecting.
CodePudding user response:
You can use concat_ws to transform a list into a string using a separator:
df = spark.createDataFrame(
data=[["john", "tomato", 1.99, 1], ["john", "carrot", 0.45, 1], ["bill", "apple", 0.99, 1],
["john", "banana", 1.29, 1], ["bill", "taco", 2.59, 1]], schema=["name", "food", "price", "col_1"])
df = df.groupBy('name').agg(collect_list(concat(col("food"), lit(" "), col("price"))).alias('sample'))\
.withColumn("sample", concat_ws(",", "sample"))
df.show(truncate=False)
---- -----------------------------------
|name|sample |
---- -----------------------------------
|john|tomato 1.99,carrot 0.45,banana 1.29|
|bill|apple 0.99,taco 2.59 |
---- -----------------------------------