Home > Back-end >  pyspark table to pandas dataframe
pyspark table to pandas dataframe

Time:03-16

I have an object type <class 'pyspark.sql.dataframe.DataFrame'> and I want to convert it to Pandas DataFRame. But the dataset is too big and I just need some columns, thus I selected the ones I want with the following:

df = spark.table("sandbox.zitrhr023")
columns= ['X', 'Y', 'Z', 'etc']

and then:

df_new= df.select(*columns).show()

but it returns a NoneType object. When I try the following:

df_new = df_new.toPandas()

It gives the following error:

AttributeError: 'NoneType' object has no attribute 'toPandas'

Do I need to put df_new in a spark dataframe before converting it with toPandas()? How do I do that?

CodePudding user response:

You are trying to cast it to Pandas Dataframe after calling show which print the Dataframe and return None, can you try the following

df_new= df.select.(*columns).toPandas()
  • Related