In the "baidu architects take you holding zero based practice deep learning" course, the practice of the first week of course assignments, asked to write a cifar - 10 data sets of data reader, this is the first assignments for this course, I also now to the end of the course, can feel the most experience zero distance paddle a job application architecture,
How out of this? At the back of the job, although the application is more and more complex, but based on the machine learning 0 to fit the premise, in the short three weekly study time, actually can only grasp the idea and process of roughly, did not delve into each algorithm implementation and design ideas,
But the data reader greatly satisfy me "study algorithm implementation" wish, without advice of ta nearly didn't do it, here to share my learning process:
First of all, the requirements to write a reading and writing, reading and out-of-order,
Read in general there are two kinds of ideas, the first is to use the local data sets, call the method is:
For the counter in the range (the number of local data), (including counter on behalf of the loop counter, similarly hereinafter),
Due to the number of local data is known, whether do random sequence or read, can use the simplest reliable all traversal methods,
Another method is the flying saw blade, to construct a reader, type reader_creator () function returns a reader () function, the characteristics of this function is a little similar to the linked list, can only read order, cannot be directly global positioning to the specific, the first N elements,
Defect is not arbitrary location read
How much advantage is can save a space, you can use the data to read how much data, don't breath read all began to follow-up work,
Cifa - 10 data set on the paddle network configuration, if using this method can not download data under the condition of reading task,
Reader () function is the basic usage of
For counter in enumerate (reader ()), each performs a for loop is equivalent to the reader () function USES a next (reader ()), let the reader reader () to call for a new value, then operating within the loop,
But the premise is the reader () function must be a with a yield function, type of the generator (generator),
The iterative cycle way subverts I for "for" function understanding, after using the yield structure, the function actually can use "for" iteration calls, and can be used to enumerate () function) (enumeration, which function as a "handful", "the enumeration variable,"
When I was surprised by the form of the magic, I am new ideas of the experiment program error:
My perfect plan is:
Trainset is to provide data reader, paddle type is reader_creator ()
Task is packaged into patch, the data sequence, generate new reader,
So naturally think of first upset, again for several times, is a reasonable design,
1, paddle. Reader. Shuffle (reader=trainset (), buf_size=1000), shuffle is a modified reader_creator function that returns a reader reader (), is used to read buf_size upset after quantity data element,
2, train_creator=paddle. Batch (paddle. Reader. Shuffle (), batch_size) and the disturb data in batches again good
But it is no good,
In batch function, for example, a reader modify function (source: https://github.com/PaddlePaddle/Paddle/blob/release/1.8/python/paddle/reader/decorator.py#L102), the reason to reader_creator () returns after processing reader () function, because its source code are defined as follows:
Def batch (reader, batch_size drop_last=False) :
Def batch_reader () :
.
Return batch_reader
According to API that the club's official website: "the interface is a reader of the decorator, return the reader will enter the reader data packed into the size of a specified batch_size batch data (batched data),"
Returns: batched reader
The return type: the generator
And batch function call is reader_creator type, paddle in the API have not clearly presents the definition of reader_creator.
But "[deep learning series] about PaddlePaddle avoid some of the" pit "skills", given a sample program:
Def reader_creator (data, label) :
Def reader () :
For I in xrange (len (data) :
Yield data [I:], int (label [I])
Return reader
Contrast, the batch function, shuffle function, also belong to reader_creator type, it can handle and return a reader type () function of the reason is that it in a function definition, nested defines a function "def reader () :"
So my code:
Train_creator=paddle. Batch (paddle. Reader. Shuffle (), batch_size)
Can not run the principle is that shuffle output function is reader (type), defined as not to be nested reader_creator type, therefore cannot be batch function call,
When operation scheme is to bypass the problem, now look at is not very satisfied with this answer, the output of the shuffle with reader_creator () format packaging, can be batch call, this is satisfied with the answer had been design train of thought,