Other machine learning, full of deep learning algorithm system interpretation can read "machine learning - principle, algorithm and application", tsinghua university press, lei, author number by SIGAI public building,
Book purchase link
Books errata, optimization, and source code resources
This article unscrambles the machine learning, classic papers in the field of deep learning, in order to lighten the burden of reading people, only lists the most classic one batch, if necessary, can according to the actual situation,
Machine learning theory
PCA (probably approximately correct) learning theory
[1] Valiant. L. a. going of the learnable. Communications of the ACM, 27, 1984.
VC (Vapnik - Chervonenkis dimension) d
[1] Blumer, a., Ehrenfeucht, a., Haussler, D., Warmuth, m. k. Learnability and the Vapnik - Chervonenkis dimension. The Journal of the ACM. 36 (4) : 929-865, 1989.
B.k., [2] Natarajan, On Learning sets and functions provides, Machine Learning, 4:67-97, 1989.
[3] Karpinski, Marek. Macintyre, Angus. The Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks. Journal of Computer and System Sciences. 54 (1) : 169-176, 1997.
Generalization theory
[1] Wolpert, D.H. Macready, W.G. No Free Lunch Theorems for Optimization. IEEE the Transactions on Evolutionary Computation (1, 67, 1997.
[2] Wolpert, David. The Lack of A dei Distinctions between Learning Algorithms, Neural Computation, pp. 1341-1390, 1996.
[3] Wolpert, D.H., and Macready, W.G. Coevolutionary free lunches. IEEE the Transactions on Evolutionary Computation, 9 (6) : 721-735, 2005.
[4] Whitley, Darrell, and Jean Paul Watson. The Complexity and found the and the no free lunch, unseen, In the Search Methodologies, pp. 317-339. Springer, Boston, MA, 2005.
[5] Kawaguchi, k., Kaelbling, L.P, and Bengio. The Generalization in deep learning. 2017.
[6] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. Understanding deep learning requires an generalization. International conference on learning representations, 2017.
Optimization theory and method
[1] l. Bottou. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade. Springer, 2012.
Martens Sutskever [2] I., J., g. Dahl, and g. Hinton. On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30 th International Conference On Machine Learning, 2013.
[3] Duchi, e. Hazan, and y. Singer. The Adaptive Subgradient The Methods for Online Learning and Stochastic Optimization. The Journal of Machine Learning Research, 2011.
[4] m. Zeiler. ADADELTA: An Adaptive Learning Rate Method. The arXiv preprint, 2012.
[5] Tieleman, t. and g. Hinton. RMSProp: Divide the gradient by running a business of its recent magnitude. COURSERA: Neural Networks for Machine Learning. The Technical report, 2012.
[6] d. Kingma, j. Ba. Adam: A Method for Stochastic Optimization. The International Conference for Learning Representations, 2015.
[7] Hardt, Moritz, Ben Recht, and Yoram Singer. The Train faster - and generalize better: Stability of stochastic gradient descent. Proceedings of The 33 rd International Conference on Machine Learning. 2016.
The decision tree
[1] Breiman, l. Friedman, j. Olshen, r. and Stone c. Classification and Regression Trees, Wadsworth, 1984.
[2] j. Ross Quinlan. Induction of decision trees. The Machine Learnin, 1 (1) : 81-106, 1986.
[3] j. Ross Quinlan. Learning efficient classification procedures and their application to chess end games. 1993.
[4] j. Ross Quinlan. C4.5: designed for Machine Learning. Morgan Kaufmann, San Francisco, CA, 1993.
Bayesian classifier
[1] Rish, Irina. An empirical study of the naive Bayes classifier, IJCAI Workshop on empirical Methods in Artificial Intelligence, 2001.
Data dimension reduction
Principal component analysis (PCA)
[1] Pearson, k. On Lines and Planes of Closest Fit to Systems of Points in Space. The Philosophical Magazine. 2 (11) : 559-572. 1901.
[2] Ian Jolliffe, t.. Principal Component Analysis, Springer Verlag, New York, 1986.
[3] Scholkopf, B., Smola, a., Mulller, k. - p. Nonlinear component analysis as A kernel eigenvalue problem. The Neural Computation, 10 (5), 1299-1319, 1998.
[4], Sebastian Mika, Bernhard Scholkopf, Alexander J Smola, Klausrobert Muller, Matthias Scholz gnu. The Kernel PCA and DE - noising in feature Spaces. The neural information processing systems, 1999.
Manifold learning
[1] Roweis, Sam T and Saul, Lawrence k. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500). 2000:2323-2326.
[2] Belkin, Mikhail and Niyogi, Partha. The Laplacian eigenmaps for dimensionality reduction and data representation, Neural computation, 15 (6), 2003:1373-1396.
[3] He Xiaofei and Niyogi, Partha. The Locality preserving projections. The NIPS. 2003:234-241.
[4] Tenenbaum, Joshua B and De Silva, Vin and Langford, John c. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500). 2000:2319-2323.
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull