Home > Mobile >  difference between pyspark.pandas.frame.DataFrame and pyspark.sql.dataframe.DataFrame and their conv
difference between pyspark.pandas.frame.DataFrame and pyspark.sql.dataframe.DataFrame and their conv

Time:09-13

I could not find any detailed documentation on this point, so what is the difference between a pyspark.pandas.frame.DataFrame and pyspark.sql.dataframe.DataFrame, and where to find the documentation of their methods?

Also how to cast, or convert one into the other and vice versa? Is it always seamless to convert them or some data types are not recognised?

CodePudding user response:

here is the doc for pyspark-pandas (AKA pandas API on pyspark) which generates (or uses) the pyspark.pandas.DataFrame. You can look through the spark doc for its native dataframe methods.

Both of them have conversion methods that can be used to convert one to other.

  • converting pyspark dataframe to pyspark-pandas dataframe can be done using to_pandas_on_spark
  • converting pyspark-pandas dataframe to pyspark dataframe can be done using to_spark
  • Related