I am trying to select a price in CSS selector for the following page;
https://www.funda.nl/en/koop/nieuwegein/huis-42656543-wattbaan-22/
The css path that I would like to select is 'strong.object-header__price'. This corresponds to the following line of code in scrapy shell and output:
response.css('strong.object-header__price').xpath('normalize-space()').extract()
['€ 675,000 k.k.']
However, I would only like to select the amount, 675,000.
For xpath I know how to do this, namely:
response.xpath("substring-before(substring(//
[@id='content']/div/div/div[1]/section[5]/div/dl[1]/dd[1]/span[1]/text(),'3','25'),'
')").extract()
Can someone please advise me how I can do the same steps, but then for the css selector? I cannot find on the internet how this can be done, so therefore this question.
Thanks in advance.
CodePudding user response:
Here are a few options:
1. Select substring (probably what you wanted):
In [1]: price = response.css('.object-header__price::text').get()[2:-5]
In [2]: price
Out[2]: '675,000'
2. Use replace:
In [1]: price = response.css('.object-header__price::text').get()
In [2]: price = price.replace('€ ', '')
In [3]: price = price.replace(' k.k.', '')
In [4]: price
Out[4]: '675,000'
3. Use regex
In [1]: import re
In [2]: price = response.css('.object-header__price::text').get()
In [3]: price = re.findall(r'(\d ,\d )', price)
In [4]: price[0]
Out[4]: '675,000'
4. Get it from the script:
In [1]: import json
In [2]: price = response.css('head > script[type="application/ld json"]::text').get()
In [3]: script_data = json.loads(price)
In [4]: script_data['offers']['price']
Out[4]: '675000'