I trying to collect the values of a pyspark dataframe column in databricks as a list.
When I use the collect function
df.select('col_name').collect()
, I get a list with extra values.
based on some searches, using .rdd.flatmap() will do the trick
However, for some security reasons (it says rdd is not whitelisted), I cannot perform or use rdd. Could there be another way to collect a column value as a list?
CodePudding user response:
if you have a small dataframe, say you only have one column, I would suggest converting it to pandas dataframe and use tolist()
function.
pdf = df.toPandas()
pdf_list = pdf['col_name'].tolist()
your output should be something like below:
['value1','value2','value3']
hope that helps