how I can pair rows with respect of a group?-CodePudding

how I can pair the elements of each row with respect of a group?

id    title    comp
1       'A'      45
1       'B'      32
1       'C'      1
2       'D'      5
2       'F'      6

out put:

I wanna pair row if they have the same 'id'

output:

id    title      comp
1    'A','B'    45,32
1    'B','C'    32,1
2    'D','F'     5,6

CodePudding user response：

Use window function. Collect list of the immediate consecutive elements in the target columns. Remove the array brackets by converting resultant arrays into strings using array_join. Last row will have less elements. filter out where the list has less more than 0ne elements.

from pyspark.sql.functions import *

from pyspark.sql import Window
win=Window.partitionBy().orderBy(F.asc('title')).rowsBetween(0,1)


 df.select("id", *[F.array_join(F.collect_list(c).over(win),',').alias(c) for c in df.drop('id').columns]).filter(length(col('comp'))>1).show()

 --- ----- ----- 
| id|title| comp|
 --- ----- ----- 
|  1|  A,B|45,32|
|  1|  B,C| 32,1|
|  1|  C,D|  1,5|
|  2|  D,F|  5,6|
 --- ----- -----