Home > Enterprise >  Error while using partition by clause in pyspark
Error while using partition by clause in pyspark

Time:08-13

I need to use partition by clause in two columns and found the rownumber. Also, I need to extract only the row which has rownumber= 1.

I have df3 dataframe which holds these data:

enter image description here

I am trying to use partition by clause using two columns "category_name,SubCategoryName" and ordering them by totalsales descending:

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
    windowSpec  = Window.partitionBy("category_name,SubCategoryName").orderBy("total_sales_360 desc")
    
    df3.withColumn("row_number",row_number().over(windowSpec)).show(truncate=False)

I am getting error while trying to see the df3 after using partition by.

CodePudding user response:

change .partitionBy("category_name,SubCategoryName") to .partitionBy("category_name", "SubCategoryName")

  • Related