Extract text by index from multiple classes, xpaths, or css selectors-CodePudding

The HTML I'm scraping has multiple classes of the same name. It looks like this:

<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">56</span>

<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">45</span>

<span aria-hidden="true" class="v-align-middle social-details-social-counts__reactions-count">10</span>

And so on...

I want to pull the 'innerText' ("56", "45", "10" respectively) from each of the first (20) instances of this class. My thought was that I could pull each by their index by using find_elements_by_xpath, find_elements_by_class_name, or find_elements_by_css_selector.

If I were to do it by css_selector, this is what I think it would look like:

likes1 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(1)")
likes2 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(2)")
likes3 = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(3)")

And so on to (20)...

Before I start appending/printing the data, I also want to iterate these through a loop. If I'm fetching an individual element (by index), I don't understand why I need a loop if I didn't return a list, but that's another matter. So it would look like this:

for l1 in likes1:
     data.append({
     "Likes1": l1.text
     })
for l2 in likes2:
     data.append({
     "Likes2": l2.text
     })
for l3 in likes3:
     data.append({
     "Likes3": l3.text
     })

And so on down to l20.text...

print(data)
driver.close()

Since this is a long-scrolling page, I guess incorporating multiple window.scrolls or Keys.PAGE_DOWN will need to be there as well. Also another matter...

My question is: Am I doing this correctly? Is there a better/more efficient way, for example, without having to loop through each? Thanks in advance for telling me where I'm going wrong. These are the woes of growth!

CodePudding user response：

if this class represent

v-align-middle social-details-social-counts__reactions-count

20 or more elements, and you only want to scrape 20 initial elements. Please use a counter with find_elements.

Sample code :

counter = 1
data = []
likes = driver.find_elements_by_css_selector("span.v-align-middle.social-details-social-counts__reactions-count")
for like in likes:
    if counter <= 20:
        print(like.text)
        data.append(like.text)
        counter = counter   1

In case you do not wish to have counter and wants to scrape everything, please use :

data = []
likes = driver.find_elements_by_css_selector("span.v-align-middle.social-details-social-counts__reactions-count")
for like in likes:
     print(like.text)
     data.append(like.text)
     counter = counter   1

CodePudding user response：

You can do this by using single loop instead of writing seperate loop for each class of elements. Please try like below,

allLikes = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count")
    for i in allLikes
     data.append({
         "element" i = driver.find_elements_by_css_selector(".v-align-middle.social-details-social-counts__reactions-count > span:nth-of-type(" i ")")
         "Likes" i: "element" i.text
         })
print(data)
driver.close()