Home > other >  The Spark of wholeTextFiles interface
The Spark of wholeTextFiles interface

Time:09-19

In most cases, the use of spark to read documents, we will call
SparkContext. TextFile () method, but when we input files for a large number of small files, read this way efficiency is lower, will create a task for every little file, so I in the official inquiry, know that there is a file input interface is called: wholeTextFiles, website explanation is as follows:

But after the use of the interface to read file, several files will be spliced into a string as the content of the RDD returns, the split cutting to disorganized fields, have great god used this interface?

CodePudding user response:

This is equivalent to traverse a folder to all the data form the key - value to form the key value is the file content is a path!

CodePudding user response:

Classmate hello, Java can use above StringUtils. Split (content, SeparatorUtil separator_next) is an array, SeparatorUtil. Separator_next is' \ n ', the hope can help you,
  • Related