Home > other >  Why is alias not working with groupby and count
Why is alias not working with groupby and count

Time:10-28

I'm running the following block and I'm wondering why .alias is not working:

data = [(1, "siva", 100), (2, "siva", 200),(3, "siva", 300),
        (4, "siva4", 400),(5, "siva5", 500)]
schema = ['id', 'name', 'sallary']

df = spark.createDataFrame(data, schema=schema)
df.show()
display(df.select('name').groupby('name').count().alias('test'))

Is there a specific reason? In which case .alias() is supposed to be working in a similar situation? Also why no errors are being returned?

CodePudding user response:

You could change syntax a bit to apply alias with no issue:

from pyspark.sql import functions as F

df.select('name').groupby('name').agg(F.count("name").alias("test")).show()

# output
 ----- ---- 
| name|test|
 ----- ---- 
|siva4|   1|
|siva5|   1|
| siva|   3|
 ----- ---- 

I am not 100% sure, but my understanding is that when you use .count() it returns entire Dataframe so in fact .alias() is applied to entire Dataset instead of single column that's why it does not work.

  • Related