I have an object type <class 'pyspark.sql.dataframe.DataFrame'>
and I want to convert it to Pandas DataFRame. But the dataset is too big and I just need some columns, thus I selected the ones I want with the following:
df = spark.table("sandbox.zitrhr023")
columns= ['X', 'Y', 'Z', 'etc']
and then:
df_new= df.select(*columns).show()
but it returns a NoneType
object. When I try the following:
df_new = df_new.toPandas()
It gives the following error:
AttributeError: 'NoneType' object has no attribute 'toPandas'
Do I need to put df_new
in a spark dataframe before converting it with toPandas()
? How do I do that?
CodePudding user response:
You are trying to cast it to Pandas Dataframe after calling show
which print the Dataframe and return None
, can you try the following
df_new= df.select.(*columns).toPandas()