Home > Back-end >  Drop rows with conditions in PySpark Pandas API
Drop rows with conditions in PySpark Pandas API

Time:07-01

I would like to know how to do that using PySpark Pandas API.

This is Pandas version:

indexNames = dfObj[ (dfObj['Age'] >= 30) & (dfObj['Age'] <= 40) ].index
dfObj.drop(indexNames , inplace=True)

But I would like to do that using PySpark Pandas API.

Could you please help me?

Thanks a lot

CodePudding user response:

You should follow this guide initially:

https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/pandas_pyspark.html#pandas

example will look like this:


import pyspark.pandas as ps

psdf = ps.range(10)
pdf = psdf.to_pandas()
pdf.values

And you can work how you like from this...

CodePudding user response:

Thanks dude. I found the solution:

array = indexNames.to_numpy()   
dfObj = dfObj.drop(index = array)
  • Related