Home > Back-end >  Pandas to Pyspark environment
Pandas to Pyspark environment

Time:09-20

newlist = []
for column in new_columns:
    count12 = new_df.loc[new_df[col].diff() == 1]
    new_df2=new_df2.groupby(['my_id','friend_id','family_id','colleage_id']).apply(len)

There is no option is available in pyspark for getting all length of column

How can we achieve this code into pyspark.

Thanks in advance..

CodePudding user response:

Literally, apply(len) is just an aggregation function that would count grouped elements from groupby. You can do the very same thing in basic PySpark syntax

import pyspark.sql.functions as F

(df
    .groupBy('my_id','friend_id','family_id','colleage_id')
    .agg(F.count('*'))
    .show()
)
  • Related