The HTML I'm scraping has multiple classes of the same name. It looks like this:
<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">56</span>
<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">45</span>
<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">10</span>
And so on...
I want to pull the 'innerText' ("56", "45", "10" respectively) from each of the first (20) instances of this class. My thought was that I could pull each by their index by using find_elements_by_xpath, find_elements_by_class_name, or find_elements_by_css_selector.
If I were to do it by css_selector, this is what I think it would look like:
likes1 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(1)")
likes2 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(2)")
likes3 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(3)")
And so on to (20)...
Before I start appending/printing the data, I also want to iterate these through a loop. If I'm fetching an individual element (by index), I don't understand why I need a loop if I didn't return a list, but that's another matter. So it would look like this:
for l1 in likes1:
data.append({
"Likes1": l1.text
})
for l2 in likes2:
data.append({
"Likes2": l2.text
})
for l3 in likes3:
data.append({
"Likes3": l3.text
})
And so on down to l20.text...
print(data)
driver.close()
Since this is a long-scrolling page, I guess incorporating multiple window.scrolls or Keys.PAGE_DOWN will need to be there as well. Also another matter...
My question is: Am I doing this correctly? Is there a better/more efficient way, for example, without having to loop through each? Thanks in advance for telling me where I'm going wrong. These are the woes of growth!
CodePudding user response:
if this class represent
v-align-middle social-details-social-counts__reactions-count
20 or more elements, and you only want to scrape 20 initial elements. Please use a counter with find_elements
.
Sample code :
counter = 1
data = []
likes = driver.find_elements_by_css_selector("span.v-align-middle.social-details-social-counts__reactions-count")
for like in likes:
if counter <= 20:
print(like.text)
data.append(like.text)
counter = counter 1
In case you do not wish to have counter and wants to scrape everything, please use :
data = []
likes = driver.find_elements_by_css_selector("span.v-align-middle.social-details-social-counts__reactions-count")
for like in likes:
print(like.text)
data.append(like.text)
counter = counter 1
CodePudding user response:
You can do this by using single loop instead of writing seperate loop for each class of elements. Please try like below,
allLikes = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count")
for i in allLikes
data.append({
"element" i = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(" i ")")
"Likes" i: "element" i.text
})
print(data)
driver.close()