SparkContext. TextFile () method, but when we input files for a large number of small files, read this way efficiency is lower, will create a task for every little file, so I in the official inquiry, know that there is a file input interface is called: wholeTextFiles, website explanation is as follows:
But after the use of the interface to read file, several files will be spliced into a string as the content of the RDD returns, the split cutting to disorganized fields, have great god used this interface?
CodePudding user response:
This is equivalent to traverse a folder to all the data form the key - value to form the key value is the file content is a path!CodePudding user response:
Classmate hello, Java can use above StringUtils. Split (content, SeparatorUtil separator_next) is an array, SeparatorUtil. Separator_next is' \ n ', the hope can help you,