I have a column in a csv file, using spark 2.4 and scala 2.11 I want to count the number of composers delimetered by |,
--------------------
| composer|
--------------------
| MJ | NEO|
|TEDDY| FUTURE BO...|
| Kenny G|
| 湯小康|
--------------------
Outcome
--------------------
| count_composer|
--------------------
| 2|
| 2|
| 1|
| 1|
--------------------
CodePudding user response:
Use split
and size
functions.
val df2 = df.select(size(split(col("composer"),"|")).as("count_composer"))
df2.show(false)
CodePudding user response:
Hi I came across this scala - String split("|") works incorrectly?
scala> df.select($"composer", size(split($"composer", "\\|"))).show(5)
-------------------- -------------------------
| composer|size(split(composer, \|))|
-------------------- -------------------------
| 董貞| 1|
|TEDDY| FUTURE BO...| 3|
| null| -1|
| 湯小康| 1|
| Traditional| 1|
-------------------- ------------------------- `