Chapter 1 introduction:
The application of advanced data analysis technology,
, commercial and industrial
Business intelligence application,
Directional sales, customer analysis, workflow management, store distribution, fraud detection, automated buying and selling
Internet-based services,
Filtering spam, answer the search query, it is suggested that social network updates and contact
And mobile devices, mobile sensor
The family system, planning intelligent city
, medicine, science and engineeringThe global climate system,
Genomic data,
Electronic health record data,
What is data mining,
Definition,
Large database automatically in the discovery of useful information
Knowledge discovery in database, the KDD
Data mining is an indispensable part of the????
The input data,
All forms of storage
Data preprocessing,
The most time-consuming and laborious
Data mining,
Post-processing,
Information,
The problem of data mining,
Scalable,
Reason: the TB, PB, EB
Methods
Core algorithm, sampling techniques, parallel and distributed algorithm
, high dimensional sex
Reason,
Properties, more
Time and space, weight
Heterogeneous data and complex data
Non-traditional type data,
The ownership of data, and distribution
Belong to several institutions,
, distributed data mining technologyNon-traditional analysis,
, automatically generate and evaluate assumptions
Representing data, timing of sample
The origin of data mining,
In the late 1980 s?
, ranging from
, adopt
Optimization, evolutionary computation, information theory, signal processing, visualization and information retrieval, information?
Support,
Distributed technology: processing huge amounts of data,
Scientific data and data driven found
Data science,
Is a research and application of tools and techniques from the data to derive useful insights interdisciplinary field,
Field,
Data mining, statistics, artificial intelligence, machine learning, pattern recognition, database technology, distributed and parallel computing
Scientific data, data driven approach
Find patterns and relationships in the data directly?
Successful example,
The progress of the neural network that deep learning
The task of data mining,
Divided into forecasting task and describe the task?
Prediction task,
According to the values of other attributes to predict specific attributes values
Describes the task,
Export outline potential links between patterns in the data
Four main mining tasks,
Predictive modeling,
Definition,
Target variable model, and as a function of the explanatory variables,
, two types of task
Classification,
The target prediction of discrete variables
Regression,
Continuous,
Correlation analysis,
Definition,
Find models of the strong correlation in the data feature
Form,
Cover the rules or feature subset
Target,
With effective way to extract the most interesting patterns
Application,
Find out the related functional genomics, identification web users visit together, understand the connection between the different elements in the earth's climate system, etc.,
Cluster analysis,
Definition,
Found that the observation group, of closely related that with belong to different clusters? Compared to the observed value, belong to the same cluster as similar as possible between the observed value of
Application,
Customer group, significant impact on the earth's climate sea area and compressed data
Anomaly detection,
Definition,
Identify characteristics significantly different from other data observed value
Observations,
Abnormal points and outliers
Application,
Fraud detection, network attacks, unusual patterns of disease, ecosystem disturbance