Have a table where I want to go in range of two rows
id | col b | message
1 | abc | hello |
2 | abc | world |
3 | abc 1| morning|
4 | abc | night |
...|... | .... |
100| abc1 | Monday |
101| abc1 | Tuesday|
How to I create below table that goes in a range of two and shows the first id with the second col b and message in spark.
Final table will look like this.
id | full message
1 | 01:02,abc,world
3 | 03:04,abc,night
.. |................
100| 100:101,abc1,Tuesday
CodePudding user response:
With pandas, you can use:
group = np.arange(len(df))//2*2 1
(df.astype({'id': 'str'})
.groupby(group)
.agg(**{'id': ('id', ':'.join),
'first': ('col b', 'first'),
'last': ('message', 'last'),
})
.agg(','.join, axis=1)
.reset_index(name='full message')
)
Output:
id full message
0 1 1:2,abc,world
1 3 3:4,abc 1,night
2 5 100:101,abc1,Tuesday