Home > other >  Assigned by the teacher want to do big homework bayesian classification algorithm can try try our id
Assigned by the teacher want to do big homework bayesian classification algorithm can try try our id

Time:09-27

The Assignment 1: Distributed Naive Bayes theorem for Data Classification

Object: to Design a distributed version of Naive Bayes theorem with graphs on Hadoop, and apply the designed algorithm for data classification. Your final report should include the following content:
1. The detailed algorithm for distributed Naive Bayes theorem.
2. The source code for The core algorithm.
3. The Experimental results: (1) your Experimental environments, to the as the CPU and the Memory of your those; (2) classification; (3) the computation time. (4) the classification accuracy. (5) other findings.

The Schedule:
1. Implement the Distributed Naive Bayes theorem on the fully Distributed pattern. (another awarding. 2 nd 2015)

The Dataset 1: http://archive.ics.uci.edu/ml/datasets.html (Accuracy)
The Dataset 2: There are 2 pairs of the Dataset (Speed).

(1) the UCI dataset.


(2) The file "1. TXT" as training data set, and The file "2. TXT" as The data set to be classified.
The "1. TXT" contains 5000000 training samples. It contains 102 columns. The first column is The ID, The 2nd to The 101th column is The attributes, and The last column is The classification, The "2. TXT" contains 500000 samples to be classified. It contains 101 columns, which is The same structure to The "1. TXT file" 's first 101 columns.


1. TXT and 2. TXT download from there
http://pan.baidu.com/s/1bqYZG

  • Related