Home > Enterprise >  Aggregate column on rows with condition
Aggregate column on rows with condition

Time:10-25

I have the following dataframe:

Country    Qty     

Belgium    54                       
Belgium    8                      
Belgium    67                      
France     12                       
France     3                      
France     34
Italy      25
Italy      45
Italy       9

Is it possible to groupBy this dataframe by column "Country", aggregate average of the "Qty" output average Qty by Belgium? I am using Spark Python.

CodePudding user response:

This has been solved!

df.filter(df['country'] == 'Belgium').agg(avg(col("Qty")

CodePudding user response:

from pyspark.sql import functions as F

(
    df
    .groupBy("Country")
    .agg(F.mean("Qty").alias("avg"))
    .filter(F.col("Country") == "Belgium")
    .show()
)

# output
 ------- ---- 
|Country| avg|
 ------- ---- 
|Belgium|43.0|
 ------- ---- 
  • Related