Home > database >  how can i default values for scraped result when they have a return of null/none
how can i default values for scraped result when they have a return of null/none

Time:11-02

I have scraped some informations from a website in which some outputs are not present and it returns null. is there a way to output a default value in such case for different fields. The sample script is below.

script.py

import scrapy

class UfcscraperSpider(scrapy.Spider):
    name = 'ufcscraper'

    start_urls = ['http://ufcstats.com/statistics/fighters?char=a']

    def parse(self, response):
        for user_info in response.css(".b-statistics__table-row")[2::]:
            result = {
                "fname": user_info.css("td:nth-child(1) a::text").get(),
                "lname": user_info.css("td:nth-child(2) a::text").get(),
                "nname": user_info.css("td:nth-child(3) a::text").get(),
                "height": user_info.css("td:nth-child(4)::text").get().strip(),
                "weight": user_info.css("td:nth-child(5)::text").get().strip(),
                "reach": user_info.css("td:nth-child(6)::text").get().strip(),
                "stance": user_info.css("td:nth-child(7)::text").get().strip(),
                "win": user_info.css("td:nth-child(8)::text").get().strip(),
                "lose": user_info.css("td:nth-child(9)::text").get().strip(),
                "draw": user_info.css("td:nth-child(10)::text").get().strip()
            }

        yield result

For instance nname field in the first row has a value of null while stance has a value of "", which is an empty string or so, how can i have a default value for such occurrences.

sample result

[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]

CodePudding user response:

You could either put in the logic to replace any "" in your function or you could just loop through the result and when you come across "" replaqce with whatever you'd like as the default.

data = [
{"fname": "Tom", "lname": "Aaron", "nname": "", "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]


for idx, each in enumerate(data):
    for k, v in each.items():
        if v == '':
            data[idx][k] = 'DEFAULT'

Output:

print(data)
[
{'fname': 'Tom', 'lname': 'Aaron', 'nname': 'DEFAULT', 'height': '--', 'weight': '155 lbs.', 'reach': '--', 'stance': 'DEFAULT', 'win': '5', 'lose': '3', 'draw': '0'}, 
{'fname': 'Danny', 'lname': 'Abbadi', 'nname': 'The Assassin', 'height': '5\' 11"', 'weight': '155 lbs.', 'reach': '--', 'stance': 'Orthodox', 'win': '4', 'lose': '6', 'draw': '0'}
]
  • Related