Home > database >  Essentially a full-text index of the pit of Chinese word segmentation solution
Essentially a full-text index of the pit of Chinese word segmentation solution

Time:12-22

Essentially a full-text retrieval is known to all the Chinese word segmentation, word segmentation is weak, and the segmentation results are inconsistent,
Lead to not like '% %' access to the full results,

On the first example
1, the select * from sys. Dm_fts_parser (' jinan Yellow River bridge engineering company, 2052, null, 0) results the following
Display_term
Jinan
The Yellow River
Luqiao
Engineering
The company

2, select * from sys. Dm_fts_parser (jinan Yellow River bridge construction group co., LTD., 2052, null, 0) results the following
Display_term
Jinan
The Yellow River
Luqiao
Construction of
Group
Co., LTD.
The company

1, 2, participle in line with expectations, but search "the Yellow River bridge" no results, why? Then watch

3, select * from sys. Dm_fts_parser (' Yellow River bridge, 2052, null, 0) results the following
Display_term
Dimness
Bridge

Yes, the Yellow River bridge no into the Yellow River + luqiao, but into the dimness + bridge, led to a search of the Yellow River bridge no results,

Accidentally a wrong operation the other day, I will set the broken characters language full-text index into "Chinese - Taiwan", the search performance is much better than "simplified Chinese"
Search "the Yellow River bridge" has the result,

See the segmentation results under
select * from sys. Dm_fts_parser (' jinan Yellow River bridge engineering company, 1028, null, 0)
Display_term
Dhi
South
Yellow
River
Lk
Bridge
Engineering
The company

Are almost all words, that is to say, if we can get used to all the vocabulary word segmentation, can obtain the like '% %' more complete data results and the speed of the full-text index,

question, how to let essentially, only according to the word segmentation?
  • Related