Home > OS >  Filter an Rdd[String] based on data indicator if it is present otherwise filter based on header and
Filter an Rdd[String] based on data indicator if it is present otherwise filter based on header and

Time:10-01

I have a csv file, where each line may or may not begin with data indicator.I need to check if the data indicator is present then I've to read the records based on it, otherwise I've to read records based on the header and trailer indicator. How I can achieve this

Currently I have this code:

val rdd = spark.sparkContext.textFile(path)

val rdd1 = rdd.filter(x=>x.startsWith(dataIndicator.get))

But this code will fail when the dataIndicator field is missing in the input file and this dataIndicator field is defined as an Option[String] in case class.

Is there a way to handle this?

CodePudding user response:

val rdd1 = dataIndicator match {
  case Some(di) => rdd.filter(x => x.startsWith(di))
  case None => rdd.filter(x => x.startsWith(headerAndTrailerIndicator))
}
  • Related