For the sample data below, wondering how I can find out the most frequently occurring value in the column colour
. The data type of colour
is WrappedArray. There could be n number of elements in the array. In this example the colour should be yellow, followed by blue which appeared twice. Many thanks for your help.
Name Colour
A ('blue','yellow')
B ('pink', 'yellow')
C ('green', 'black')
D ('yellow','orange','blue')
CodePudding user response:
I would explode the colour column and then simply run groupBy and count to get what you need.
df \
.select(explode('colour').alias('colour')) \
.groupBy('colour') \
.count() \
.orderBy(col('count').desc())