Home > Enterprise >  Facebook Group Post Scraping Using Selenium Only Returns One Post
Facebook Group Post Scraping Using Selenium Only Returns One Post

Time:11-10

I'm in the process of building a Facebook Group Scraper, I have managed to write the code to log-in scrape the name of the, but for some reason, my code is only returning one result and not all of the posts of the page as I would like it to.

Here's my code:

for result in driver.find_elements_by_xpath('//div[@]'):
    poster = result.find_element_by_xpath('//a[@]/strong/span').text
    description = result.find_element_by_xpath('//div[@]').text

    groupcomments.append({
        'poster' : poster,
        'description' : description,
    })

    print(groupcomments)

Here's a snippet of the Facebook Source code (you can find it yourself over here: https://www.facebook.com/groups/286175922122417)

<div data-pagelet="GroupFeed"><div class="j83agx80 l9j0dhe7 k4urcfbm"><div class="rq0escxv l9j0dhe7 du4w35lb hybvsw6c io0zqebd m5lcvass fbipl8qg nwvqtn77 k4urcfbm ni8dbmo4 stjgntxs sbcfpzgs" style="border-radius: max(0px, min(8px, ((100vw - 4px) - 100%) * 9999)) / 8px;"><div class="ihqw7lf3"><div class="rq0escxv l9j0dhe7 du4w35lb j83agx80 cbu4d94t pfnyh3mw d2edcug0 e5nlhep0 aodizinl"><div class="rq0escxv l9j0dhe7 du4w35lb j83agx80 cbu4d94t buofh1pr tgvbjcpo"><div class="rq0escxv l9j0dhe7 du4w35lb j83agx80 cbu4d94t pfnyh3mw d2edcug0 hv4rvrfc dati1w0a"><div class="j83agx80 cbu4d94t ew0dbk1b irj2b8pg"><div class="qzhwtbm6 knvmm38d"><span class="d2edcug0 hpfvmrgz qv66sw1b c1et5uql oi732d6d ik7dh3pa ht8s03o8 a8c37x1j keod5gw0 nxhoafnm aigsh9s9 d9wwppkn fe6kdd0r mau55g9w c8b282yb iv3no6db a5q79mjw g1cxx5fr lrazzd5p oo9gr5id" dir="auto"><div class="rq0escxv l9j0dhe7 du4w35lb j83agx80 pfnyh3mw i1fnvgqd bp9cbjyn owycx6da btwxx1t3 jeutjz8y"><div class="rq0escxv l9j0dhe7 du4w35lb j83agx80 cbu4d94t g5gj957u d2edcug0 hpfvmrgz rj1gh0hx buofh1pr"

Any ideas to get all the info I'm looking for? Thanks in advance :)

CodePudding user response:

I managed to scrape what I wanted using the BeautifulSoup HTML scraper to simply scrape the information using the xpath of the info I was looking for (this not 100% foolproof solution as those can change, but it can easily be replaced in the code, so I guess it's better than nothing...)

while True:
    soup=BeautifulSoup(driver.page_source,"html.parser")
    all_posts=soup.find_all("div",{"class":"du4w35lb k4urcfbm l9j0dhe7 sjgh65i0"})
    for post in all_posts:
        try:
            name=post.find("a",{"class":"oajrlxb2 g5ia77u1 qu0x051f esr5mh6w e9989ue4 r7d6kgcz rq0escxv nhd2j8a9 nc684nl6 p7hjln8o kvgmc6g5 cxmmr5t8 oygrvhab hcukyx3x jb3vyjys rz4wbd8a qt6c0cv9 a8nywdso i1ao9s8h esuyzwwr f1sip0of lzcic4wl oo9gr5id gpro0wi8 lrazzd5p"}).get_text()
        except:
            name="not found"
        print(name)

If you want a more in-depth tutorial, I also made a video showing how I wrote it, you can watch it here (you'll find the complete code in the video description as well)

  • Related