Home > Mobile >  Spark object not serializable Regex MatchIterator
Spark object not serializable Regex MatchIterator

Time:11-07

I'm super new to scala and trying to write a little spark streaming program where lines of a text file come in, they're stripped of any non alphanumeric characters and then mapped, reduced and printed out.

I've written the code below and am running it using the following command: spark-submit --classgroup.WordCount --master yarn --deploy-mode client bdpAssignmentFour.jar hdfs:///user/s3797303'

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

object WordCount {
  def main(args: Array[String]): Unit =  {


    val sconf = new SparkConf().setAppName("SparkWordCount")
    val ssc = new StreamingContext(sconf, Seconds(5))

    val pat = "^[a-zA-Z0-9]*$".r

    val lines = ssc.textFileStream(args(0))
    val lines_map = lines.map(line => pat.findAllIn(line))
    lines_map.print()

    val wordCounts = lines_map.map((_, 1)).reduceByKey(_   _)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
  }
}

This seems to being to run just find but then returns an error about the following line: val lines_map = lines.map(line => pat.findAllIn(line))

The error it gives reads as follows: object not serializable (class: scala.util.matching.Regex$MatchIterator, value: empty iterator

What can I do to make the object serializable and hopefully end up with some code that runs?

CodePudding user response:

Quick fix:

  • put the pat value inline: lines.map(line => "...".r.findAllIn(line))
  • or move it inside an object outside of your existing method and object

You won't be able to make the regex serializable, your goal is to move it in a place where Spark won't need to serialize it.

  • Related