Home > OS >  convert a pyspark dataframe column in databricks as a list without using rdd
convert a pyspark dataframe column in databricks as a list without using rdd

Time:12-02

I trying to collect the values of a pyspark dataframe column in databricks as a list.

When I use the collect function

df.select('col_name').collect()

, I get a list with extra values.

based on some searches, using .rdd.flatmap() will do the trick

However, for some security reasons (it says rdd is not whitelisted), I cannot perform or use rdd. Could there be another way to collect a column value as a list?

CodePudding user response:

if you have a small dataframe, say you only have one column, I would suggest converting it to pandas dataframe and use tolist() function.

pdf = df.toPandas()
pdf_list = pdf['col_name'].tolist()

your output should be something like below:

['value1','value2','value3']

hope that helps

  • Related