Home > Mobile >  What metadata can actually go into a scrapy.Field object?
What metadata can actually go into a scrapy.Field object?

Time:07-10

I was reviewing the docs for Items in Scrapy today, and came across the followoing line:

Field objects are used to specify metadata for each field...You can specify any kind of metadata for each field. There is no restriction on the values accepted by Field objects.

Within the docs however, it seems like the only kind of "metadata" passed to the Field objects are functions (in this example a serializer) or input/output processors.

So I went into Python and tried to make the following Item:

class ScrapyPracticeItem(scrapy.Item):
     name = scrapy.Field()
     age = scrapy.Field('color':'purple')

But this was not accepted syntax either.

I am confused now -- could anyone give me a better definition of what they mean by metadata? Do they only mean transformations of the data in the item? Could it contain more information?

CodePudding user response:

A field object is simply an alias for the standard python dictionary. This is the actual description in the scrapy API

class scrapy.Field([arg])¶

The Field class is just an alias to the built-in dict class and doesn’t provide any extra functionality or attributes. In other words, Field objects are plain-old Python dicts. A separate class is used to support the item declaration syntax based on class attributes.

So anything that can be used as a dictionary value can be assigned to a scrapy field already without using any parameters in its constructor. The following example shows how you can create a color

class MyItem(scrapy.Item):
    color = scrapy.Field()
    age = scrapy.Field()

When they say you can set the metadata, the field is the metadata for the item that you are setting. the option to add a serializer isn't actually directly handled by the Field, instead it is handled by the Item object or its MetaClass.

This is the actual source code for the scrapy.Field class:

class Field(dict):
    """Container of field metadata"""

# that is it

All data processing and name assignment is taken care of by one of scrapy's custom metaclasses.

Scrapy is intentionally structured and borrows a lot of its methodology from django framework. The Item class and it's associated metaclass are designed to work similar to djangos Model class which it uses for communicating with a storage backend, usually a database.

However because scrapy items can be extracted and used in innumerable ways the Item class allows much more flexibility than its django counterpart, so there really are no limitations on what can be considered metadata or what can be stored in an Item class.

  • Related