Home > database >  optimise css selector for scrapy 2.5.1
optimise css selector for scrapy 2.5.1

Time:12-20

html section:

<div  style="white-space: nowrap">
 <span>Team A</span>
 <span>13</span>
 <...>
</div>

to select the parsed data, I currently use

response.css("div.team-name span::text").getall()

Out[22]: ['Team A', ' 13', ' : ', '3 ', 'Team B']

The output here is: <class 'list'>

The next thing I need to do is converting the string output into an int - so array[1] and array[3] in this example. The problem here is that the data inside the array has whitespace in it. What is the fastest way to remove the whitespace and getting the numbers into int? (I'm thinking that the nowrap here is causing troubles too)

I have tried using str.replace() / str.split() without success while handling them outside the array in seperate variables. (alternatively, xpath would also work here)

CodePudding user response:

You can remove the spaces with .strip() method.
Then you can validate if current string is a number.

datas = response.css("div.team-name span::text").getall()
for data in datas:
    data = data.strip()
    if data.isdigit():
        print(data)
  • Related