Home > Enterprise >  join two rows itertively to create new table in spark with one row for each two rows in new table
join two rows itertively to create new table in spark with one row for each two rows in new table

Time:12-01

Have a table where I want to go in range of two rows

id | col b | message
1  |  abc  | hello  |
2  |  abc  | world  |
3  |  abc 1| morning|
4  |  abc  |  night |
...|...    |  ....  |
100|  abc1 | Monday |
101|  abc1 | Tuesday|

How to I create below table that goes in a range of two and shows the first id with the second col b and message in spark.

Final table will look like this.

id | full message 
1  | 01:02,abc,world
3  | 03:04,abc,night
.. |................
100| 100:101,abc1,Tuesday

CodePudding user response:

With pandas, you can use:

group = np.arange(len(df))//2*2 1

(df.astype({'id': 'str'})
   .groupby(group)
   .agg(**{'id': ('id', ':'.join),
           'first': ('col b', 'first'),
           'last': ('message', 'last'),
          })
   .agg(','.join, axis=1)
   .reset_index(name='full message')
)

Output:

   id          full message
0   1         1:2,abc,world
1   3       3:4,abc 1,night
2   5  100:101,abc1,Tuesday
  • Related