I'm trying to convert my web scraping data into binary numbers. Basically, if the class name contains yes
it is equal to 1
and no
is equal to 0
. When I print out the binary_value, it returns all 0 even though it contains yes
. I'm not really sure what am I missing. Highly appreciated in advance.
import cfscrape
from bs4 import BeautifulSoup
scraper = cfscrape.create_scraper()
response = scraper.get('https://www.hipflat.co.th/projects/ruam-rudee-penthouse-lvukdc')
soup = BeautifulSoup(response.text, 'html.parser')
divs = soup.find_all('div', class_=lambda x: x and ("amenities__icon amenities__icon--yes" in x or "amenities__icon amenities__icon--no" in x))
# Convert the elements to binary numbers
for div in divs:
if "amenities__icon amenities__icon--yes" in div['class']:
binary_value = 1
else:
binary_value = 0
print(binary_value)
the result appears in the terminal when print(div)
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
<div ></div>
CodePudding user response:
Note: it would be shorter to get divs
using .select
with CSS selectors
divs = soup.select('div.amenities__icon:is(.amenities__icon--yes, .amenities__icon--no)')
if "amenities__icon amenities__icon--yes" in div['class']:
You can actually just check if "amenities__icon--yes" in div['class']
since every div
should have amenities__icon
anyway - the lambda
expression ensures it.
None of the items in div['class']
(which can be expected to be a list of strings) will have any spaces, since HTML classes are separated by spaces, and when BeautifulSoup parses them, they are split into a list. (It becomes quite obvious if you just print the classes with for div in divs: print(div['class'])
.)
So, the correct way to check for both amenities__icon
and amenities__icon--yes
classes would be
if "amenities__icon" in div['class'] and "amenities__icon--yes" in div['class']:
or, if you wanted that specific order for some reason, you could join the classes back into a single string before checking
if "amenities__icon amenities__icon--yes" in " ".join(div['class']):
If you use list comprehension
[int("amenities__icon amenities__icon--yes" in " ".join(d['class'])) for d in divs] # OR
# [1 if "amenities__icon amenities__icon--yes" in " ".join(d['class']) else 0 for d in divs]
would return [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0]
.