I was reviewing the docs for Items in Scrapy today, and came across the followoing line:
Field objects are used to specify metadata for each field...You can specify any kind of metadata for each field. There is no restriction on the values accepted by Field objects.
Within the docs however, it seems like the only kind of "metadata" passed to the Field objects are functions (in this example a serializer) or input/output processors.
So I went into Python and tried to make the following Item:
class ScrapyPracticeItem(scrapy.Item):
name = scrapy.Field()
age = scrapy.Field('color':'purple')
But this was not accepted syntax either.
I am confused now -- could anyone give me a better definition of what they mean by metadata? Do they only mean transformations of the data in the item? Could it contain more information?
CodePudding user response:
A field object is simply an alias for the standard python dictionary. This is the actual description in the scrapy API
class scrapy.Field([arg])¶
The Field class is just an alias to the built-in dict class and doesn’t provide any extra functionality or attributes. In other words, Field objects are plain-old Python dicts. A separate class is used to support the item declaration syntax based on class attributes.
So anything that can be used as a dictionary value can be assigned to a scrapy field already without using any parameters in its constructor. The following example shows how you can create a color
class MyItem(scrapy.Item):
color = scrapy.Field()
age = scrapy.Field()
When they say you can set the metadata, the field is the metadata for the item that you are setting. the option to add a serializer isn't actually directly handled by the Field, instead it is handled by the Item object or its MetaClass.
This is the actual source code for the scrapy.Field class:
class Field(dict):
"""Container of field metadata"""
# that is it
All data processing and name assignment is taken care of by one of scrapy
's custom metaclasses.
Scrapy is intentionally structured and borrows a lot of its methodology from django
framework. The Item class and it's associated metaclass are designed to work similar to django
s Model
class which it uses for communicating with a storage backend, usually a database.
However because scrapy items can be extracted and used in innumerable ways the Item
class allows much more flexibility than its django
counterpart, so there really are no limitations on what can be considered metadata or what can be stored in an Item
class.