I made three kinds of gestures, 90 picture, each type of gestures is 28 * 28 gray image, and then use tensorflow imitates the MNIST CNN made a network structure, training result is:
Step=0, "Train" loss=1.1398, the Test accuracy=0.33]
Step=30, "Train" loss=0.7693, the Test accuracy=0.33]
Step=60, "Train" loss=0.4144, the Test accuracy=0.33]
Step=90, Train loss=0.1760, the Test accuracy=0.33]
Step=120, Train loss=0.0833, the Test accuracy=0.33]
Step=150, Train loss=0.0477, the Test accuracy=0.33]
Step=180, Train loss=0.0313, the Test accuracy=0.33]
Step=210, Train loss=0.0224, the Test accuracy=0.33]
Step=240, Train loss=0.0170, the Test accuracy=0.33]
Step=270, Train loss=0.0135, the Test accuracy=0.33]
Step=300, Train loss=0.0111, the Test accuracy=0.33]
Step=330, Train loss=0.0093, the Test accuracy=0.33]
Step=360, Train loss=0.0080, the Test accuracy=0.33]
Step=390, Train loss=0.0069, the Test accuracy=0.33]
.
Until Step=2000, the Test accuracy is 0.33,,,
I just three kinds of hand gestures, just guess one figure is 33% probability, feel this training completely useless... For parsing, I don't know where I went wrong, but the same architecture, the 10 handwritten Numbers used to identify its accuracy is very good, I just changed the input, output, and the size of one or two convolution kernels, blinded by ing