Home > Mobile >  How to rank the column values in pyspark dataframe according to conditions
How to rank the column values in pyspark dataframe according to conditions

Time:12-02

i have a dataframe:

id  vehicle asIs    EU     EU_variant
1   A3345   PQ1298  FV1    FV1_variant
2   A3346   PQ1287  FV2    FV2_variant
3   A3346   PQ1207  FV2    FV2_variant
4   A3347           QP9    QP9_variant
5   A3347           QP9    QP9_variant
6   A3347           QP3    QP3_variant
7   A3348   MP6553  YR34   YR34_variant
8   A3348   MP6554  YR35   YR35_variant
9   A3348   MP6554  YR35   YR35_variant

for distinct vehicle and distinct EU i want to rank EU_variant and concat it in new column

result should be:

   id   vehicle asIs    EU     EU_variant   ECU_Variant_rank
    1   A3345   PQ1298  FV1    FV1_variant  FV1_variant(1)
    2   A3346   PQ1287  FV2    FV2_variant  FV2_variant(1)
    3   A3346   PQ1207  FV2    FV2_variant  FV2_variant(2)
    4   A3347           QP9    QP9_variant  QP9_variant(1)
    5   A3347           QP9    QP9_variant  QP9_variant(2)
    6   A3347           QP3    QP3_variant  QP3_variant(1)
    7   A3348   MP6553  YR34   YR34_variant YR34_variant(1)
    8   A3348   MP6554  YR35   YR35_variant YR35_variant(1)
    9   A3348   MP6554  YR35   YR35_variant YR35_variant(2)

how to achieve this using pyspark dataframe

CodePudding user response:

You can use a Window with rank:

from pyspark.sql import functions as F, Window

# you can order by the column you prefer, not only id
w = Window.partitionBy('vehicle', 'EU_variant').orderBy('id')
df.withColumn(
    'ECU_Variant_rank', 
    F.concat_ws('', F.col('EU_variant'), F.lit('('), F.rank().over(w), F.lit(')'))
)

Here the result:

 --- ------- ------ ---- ------------ ---------------- 
|id |vehicle|asIs  |EU  |EU_variant  |ECU_Variant_rank|
 --- ------- ------ ---- ------------ ---------------- 
|1  |A3345  |PQ1298|FV1 |FV1_variant |FV1_variant(1)  |
|2  |A3346  |PQ1287|FV2 |FV2_variant |FV2_variant(1)  |
|3  |A3346  |PQ1207|FV2 |FV2_variant |FV2_variant(2)  |
|4  |A3347  |null  |QP9 |QP9_variant |QP9_variant(1)  |
|5  |A3347  |null  |QP9 |QP9_variant |QP9_variant(2)  |
|6  |A3347  |null  |QP3 |QP3_variant |QP3_variant(1)  |
|7  |A3348  |MP6553|YR34|YR34_variant|YR34_variant(1) |
|8  |A3348  |MP6554|YR35|YR35_variant|YR35_variant(1) |
|9  |A3348  |MP6554|YR35|YR35_variant|YR35_variant(2) |
 --- ------- ------ ---- ------------ ---------------- 
  • Related