Home > Enterprise >  Manipulating Spark Columns in Scala
Manipulating Spark Columns in Scala

Time:12-24

I have a column in a csv file, using spark 2.4 and scala 2.11 I want to count the number of composers delimetered by |,

 -------------------- 
|            composer|
 -------------------- 
|            MJ | NEO|
|TEDDY|  FUTURE BO...|
|             Kenny G|
|               湯小康|
 -------------------- 

Outcome
 -------------------- 
|      count_composer|
 -------------------- 
|                   2|
|                   2|
|                   1|
|                   1|
 -------------------- 

CodePudding user response:

Use split and size functions.

val df2 = df.select(size(split(col("composer"),"|")).as("count_composer"))
df2.show(false)

CodePudding user response:

Hi I came across this scala - String split("|") works incorrectly?

scala> df.select($"composer", size(split($"composer", "\\|"))).show(5)
 -------------------- -------------------------                                 
|            composer|size(split(composer, \|))|
 -------------------- ------------------------- 
|                董貞|                         1|
|TEDDY|  FUTURE BO...|                        3|
|                null|                       -1|
|              湯小康|                         1|
|         Traditional|                        1|
 -------------------- ------------------------- `
  • Related