join two rows itertively to create new table in spark with one row for each two rows in new table-CodePudding

Have a table where I want to go in range of two rows

id | col b | message
1  |  abc  | hello  |
2  |  abc  | world  |
3  |  abc 1| morning|
4  |  abc  |  night |
...|...    |  ....  |
100|  abc1 | Monday |
101|  abc1 | Tuesday|

How to I create below table that goes in a range of two and shows the first id with the second col b and message in spark.

Final table will look like this.

id | full message 
1  | 01:02,abc,world
3  | 03:04,abc,night
.. |................
100| 100:101,abc1,Tuesday

CodePudding user response：

With pandas, you can use:

group = np.arange(len(df))//2*2 1

(df.astype({'id': 'str'})
   .groupby(group)
   .agg(**{'id': ('id', ':'.join),
           'first': ('col b', 'first'),
           'last': ('message', 'last'),
          })
   .agg(','.join, axis=1)
   .reset_index(name='full message')
)

Output:

   id          full message
0   1         1:2,abc,world
1   3       3:4,abc 1,night
2   5  100:101,abc1,Tuesday