Home > other >  Select amount in CSS selector
Select amount in CSS selector

Time:12-13

I am trying to select a price in CSS selector for the following page;

https://www.funda.nl/en/koop/nieuwegein/huis-42656543-wattbaan-22/

The css path that I would like to select is 'strong.object-header__price'. This corresponds to the following line of code in scrapy shell and output:

response.css('strong.object-header__price').xpath('normalize-space()').extract()
['€ 675,000 k.k.']

However, I would only like to select the amount, 675,000.

For xpath I know how to do this, namely:

response.xpath("substring-before(substring(// 
[@id='content']/div/div/div[1]/section[5]/div/dl[1]/dd[1]/span[1]/text(),'3','25'),' 
')").extract() 

Can someone please advise me how I can do the same steps, but then for the css selector? I cannot find on the internet how this can be done, so therefore this question.

Thanks in advance.

CodePudding user response:

Here are a few options:

1. Select substring (probably what you wanted):

In [1]: price = response.css('.object-header__price::text').get()[2:-5]

In [2]: price
Out[2]: '675,000'

2. Use replace:

In [1]: price = response.css('.object-header__price::text').get()

In [2]: price = price.replace('€ ', '')

In [3]: price = price.replace(' k.k.', '')

In [4]: price
Out[4]: '675,000'

3. Use regex

In [1]: import re

In [2]: price = response.css('.object-header__price::text').get()

In [3]: price = re.findall(r'(\d ,\d )', price)

In [4]: price[0]
Out[4]: '675,000'

4. Get it from the script:

In [1]: import json

In [2]: price = response.css('head > script[type="application/ld json"]::text').get()

In [3]: script_data = json.loads(price)

In [4]: script_data['offers']['price']
Out[4]: '675000'

  • Related