The lucene index files stored in the hadoop HDFS file system question-CodePudding

due to business needs, need a great deal of data (billions) for word segmentation and generate the index file, now with four solr clusters to provide query service, heard that performance is good, but given the amount of data will continue to increase, 40 g of data every day need to join index, index files will eventually become the bottleneck of system unceasing expansion, now consider using hadoop DHFS file system to store the index file, check a lot of information on the Internet, however, there are a number of netizens said hadoop HDFS file system does not support random write (lucene index [size=16 px] is random, speaking, reading and writing), the solution is to first write the index in the local or memory, and then write the local or the index of the memory in the HDFS system, query index is read from the HDFS first index write memory, first check in from the memory...

Questions: 40 g data volume is too big, written to memory reality, certainly not the first written to a local synchronization to the HDFS again, when the query from the HDFS first index to a local disk, the efficiency is not certainly high demand for such a great god genuflect is begged any good solution. [/size]

CodePudding user response:

Wrote a piece of code before, is through the graphs generated Lucene file, and then use to show the solr, solr can use HDFS as storage,

CodePudding user response:

Have other contact ElasticSearch, unlike the distributed cluster Hadoop so trouble, he is also a third party based on Lucene open source solution, can try, also said before MR generated Lucene if necessary can contact me (2012),

CodePudding user response:

Thank you very much provide a solution, implementation has yet to be slowly study, had the direction, THX

CodePudding user response:

Solr can direct access to the HDFS file?

CodePudding user response:

Don't know is it solve the same problem, but first to write index HDFS, then pull to disk, but direct write HDFS than write less disk segment files directly, don't know what's the matter,