Spark RDD operation question-CodePudding

When dealing with a log file, I need to take to a specified category of RDD what to have more efficient operation of characters?
My approach is: RDD. Collect (converted to a form of Array, and then operate)

CodePudding user response:

To map operation of RDD, calls to every line of the split segmentation and then take the specified column

CodePudding user response:

Try the Spark SQL, the structured data file to import for the DataFrame, then do the same as database operation file data, including the filter, group, agg

CodePudding user response:

RDD. FlatMap ()