Home > Software design >  Pivot and Concatenate columns in pyspark dataframe
Pivot and Concatenate columns in pyspark dataframe

Time:08-24

I have this dataframe below, and I need to get basically one row with all the marks fields concatenated with a delimiter like pipe.
So: PACKAGING MARKS 3|PACKAGING MARKS 2|PACKAG.....

And there can be varying amounts of marks records for each mid.

mid marksId id index marks
2 3 3 2 PACKAGING MARKS 3
2 3 3 1 PACKAGING MARKS 2
2 3 3 0 PACKAGING MARKS 1
2 4 4 2 PACKAGING MARKS 23
2 4 4 1 PACKAGING MARKS 22
2 4 4 0 PACKAGING MARKS 21

Thanks

CodePudding user response:

Assuming you want 1 delimited string for each "mid", you can collect all "marks" with collect_list() and use concat_ws() to create the string:

import pyspark.sql.functions as F

df.groupby('mid').agg(F.concat_ws('|', F.collect_list('marks')).alias('marks_str')).show(truncate=False)
  • Related