'Column' object is not callable after groupBy and aggregation in Pyspark-CodePudding

I have the following dataframe.

docid_ province 123 zhejiang 123 zhejiang 123 shanghai 456. zhejiang

I want to find the most frequent province for each docid. so I first groupBy docid and then count the frequency. But I got the error of 'Column' object is not callable

This is my code:

uin_feature_province_count = uin_feature.groupBy("docid_").\
    agg(col("province").groupBy("province").count().orderBy(col("province").desc).collect()(0).get(0).alias("most_province"))

CodePudding user response：

I haven't tried to fix your code but if just you need most common province for each doc id you can try using row number over count of province as done below

uin_feature_province_count = uin_feature.groupBy("docid_","province").agg(row_number().over(Window.partitionBy("docid_").orderBy(count("province").desc())).alias("rank")).filter(col("rank")==1).select("docid_","province")