Home > database >  High altitude data
High altitude data

Time:10-07

Is not hard to the concept of high-dimensional data, simple said is the meaning of multidimensional data, we often contact at ordinary times is one dimensional data, or can be written as 2 d in the form of table data, high-dimensional data can also be analogy, but higher dimension, visual representation is difficult,
The high dimensional data mining is a research focus, this is its characteristic:
High dimensional data mining is a kind of data mining based on high dimension, it and traditional data mining the main difference lies in its high dimension, at present, the high dimensional data mining has become the emphasis and difficulty in data mining, with the progress of technology makes the data collection is becoming more and more easily, cause the database scale is more and more big, the complexity is higher and higher, such as various types of trade transaction data, Web documents, gene expression data, document frequency data, user rating data, Web usage data and multimedia data, etc., their dimensions (attributes) usually can reach hundreds of thousands of dimension, or even higher,
Because the universality of high dimensional data is of high dimensional data mining research has very important significance, but because of the influence of the "dimension disaster", also makes the high dimensional data mining has become unusually difficult, some special means must be used for processing, with the increase of data dimension, high dimensional index structure performance drops rapidly, in low dimensional space, we often use Euclidean distance as the similarity measure between the data, but in many cases this similarity in high-dimensional space no longer exists, the concept of this brings high dimensional data mining very serious test, on the one hand, the cause of the data mining algorithm based on index structure performance degradation, on the other hand, a lot of mining method based on the whole space distance function is also fails, the solution can have the following kinds: by dimension reduction will be from the high-dimensional data to lower dimensions, and then a low dimensional data processing method for processing; Decline in efficiency of algorithm that can design a more effective index structure, the incremental algorithm and parallel algorithm is used to improve the performance of the algorithm; Problems of failure by redefining its new,
  • Related