I'm trying to groupBy a variable (column) called saleId
, and then get the Sum for it, using an attribute (column) called totalAmount
with the code below:
df = df.groupBy('saleId').agg({"totalAmount": "sum"})
But I get the following error:
Attribute sum(totalAmount) contains an invalid character among ,;{}()\n\t=. Please use an alias to rename it
I'm assuming there's something wrong with the way I'm using groupBy, because I get other errors even when I try the following code instead of the above one:
df = df.groupBy('saleId').sum('totalAmount')
What's the problem with my code?
CodePudding user response:
OK, I figured out what went wrong.
The code I used in my question, returns the whole sum(totalAmount)
as the name of the variable (column), which as you can see includes parenthesis.
This can be avoided by using:
df= df.groupBy('saleId').agg({"totalAmount": "sum"}).withColumnRenamed('sum(totalAmount)', 'totalAmount')
or
df.groupBy('saleId').agg(F.sum("totalAmount").alias(totalAmount))