Home > Back-end >  Binary numbers returns all 0
Binary numbers returns all 0

Time:12-18

I'm trying to convert my web scraping data into binary numbers. Basically, if the class name contains yes it is equal to 1 and no is equal to 0. When I print out the binary_value, it returns all 0 even though it contains yes. I'm not really sure what am I missing. Highly appreciated in advance.

import cfscrape

from bs4 import BeautifulSoup

scraper = cfscrape.create_scraper()

response = scraper.get('https://www.hipflat.co.th/projects/ruam-rudee-penthouse-lvukdc')

soup = BeautifulSoup(response.text, 'html.parser')

divs = soup.find_all('div', class_=lambda x: x and ("amenities__icon amenities__icon--yes" in x or "amenities__icon amenities__icon--no" in x))

# Convert the elements to binary numbers

for div in divs:
  if "amenities__icon amenities__icon--yes" in div['class']:
    binary_value = 1
  else:
    binary_value = 0
    
  print(binary_value)

the result appears in the terminal when print(div)

<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>

CodePudding user response:

Note: it would be shorter to get divs using .select with CSS selectors

divs = soup.select('div.amenities__icon:is(.amenities__icon--yes, .amenities__icon--no)')

  if "amenities__icon amenities__icon--yes" in div['class']:

You can actually just check if "amenities__icon--yes" in div['class'] since every div should have amenities__icon anyway - the lambda expression ensures it.

None of the items in div['class'] (which can be expected to be a list of strings) will have any spaces, since HTML classes are separated by spaces, and when BeautifulSoup parses them, they are split into a list. (It becomes quite obvious if you just print the classes with for div in divs: print(div['class']).)

So, the correct way to check for both amenities__icon and amenities__icon--yes classes would be

  if "amenities__icon" in div['class'] and "amenities__icon--yes" in div['class']:

or, if you wanted that specific order for some reason, you could join the classes back into a single string before checking

  if "amenities__icon amenities__icon--yes" in " ".join(div['class']):

If you use list comprehension

[int("amenities__icon amenities__icon--yes" in " ".join(d['class'])) for d in divs] # OR
# [1 if "amenities__icon amenities__icon--yes" in " ".join(d['class']) else 0 for d in divs]

would return [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0].

  • Related