Home > Mobile >  WebScrape -Getting the href
WebScrape -Getting the href

Time:12-11

At the end of each row of this page, there is a "View Posters" link that contains a URL.

The first one I have pulled in my code, pulls fine as "ur"

I am not sure how to pull the view poster url.

rom selenium import webdriver
import time
import pandas as pd
driver = webdriver.Chrome()
import requests
from bs4 import BeautifulSoup

val=[]

absinfo=[]
sesinfo=[]

url = 'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&filters={"sessionType":[{"key":"Poster Session"}]}'
res=requests.get(url)
soup=BeautifulSoup(res.content,'html.parser')


driver.get(url)
time.sleep(4)


productlist =driver.find_elements_by_xpath(".//div[@class='session-card']")
#times = soup.select('.time')

for b in productlist:
    ur=b.find_element_by_css_selector('a').get_attribute('href')

CodePudding user response:

If you want to use selenium then try with following xpath to identify both the href links under the productlist.

driver.get("https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/286/program-guide/search?q=&filters={"sessionType":[{"key":"Poster Session"}]}")

productlist =driver.find_elements_by_xpath(".//div[@class='session-card']")

for item in productlist:
     print("Url 1 :"   item.find_element_by_xpath(".//span[@data-cy='sessionTitle']//a").get_attribute('href'))
     print("View Poster :"   item.find_element_by_xpath(".//a[.//span[text()='View Posters']]").get_attribute('href'))

Output:

Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14170
View Poster :https://meetings.asco.org/session/14170
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14145
View Poster :https://meetings.asco.org/session/14145
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14169?presentation=205955
View Poster :https://meetings.asco.org/session/14169
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14168
View Poster :https://meetings.asco.org/session/14168
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14450
View Poster :https://meetings.asco.org/session/14450
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14163
View Poster :https://meetings.asco.org/session/14163
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14449
View Poster :https://meetings.asco.org/session/14449
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14451
View Poster :https://meetings.asco.org/session/14451
Url 1 :https://meetings.asco.org/2022-asco-gastrointestinal-cancers-symposium/14166
View Poster :https://meetings.asco.org/session/14166

CodePudding user response:

You also can get your desired data from api calls json response as follows:

Script

import requests
import json

body = {
    "operationName": "Search",
    "variables": {
        "q": "*",
        "sortBy": "Relevancy",
        "size": "50",
        "pageNumber": "1",
        "filters": {
            "contentTypeGroupLabel": [
                {
                  "key": "Sessions"
                }
            ],
            "meetingId": [
                {
                    "key": "286"
                }
            ]
        },
        "pages": [
            "1"
        ]
    },
    "query": "query Search($q: String!, $filters: SearchFilters, $pageNumber: Int, $size: Int, $sortBy: SearchResultsSortBy, $groupBy: SearchGroupBy, $groupSize: Int, $searchFields: [SearchField]) {\n  search(\n    q: $q\n    filters: $filters\n    pageNumber: $pageNumber\n    size: $size\n    sortBy: $sortBy\n    groupBy: $groupBy\n    groupSize: $groupSize\n    searchFields: $searchFields\n  ) {\n    status\n    result {\n      suggestion\n      groups {\n        total\n        hits {\n          ...SearchHitFields\n          innerHits {\n            total\n            hits {\n              ...SearchHitFields\n              __typename\n            }\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      aggregations {\n        meetingYear {\n          key\n          doc_count\n          __typename\n        }\n        sessionType {\n          key\n          doc_count\n          __typename\n        }\n        meetingTypeName {\n          key\n          doc_count\n          __typename\n        }\n        topic {\n          key\n          doc_count\n          children {\n            key\n            doc_count\n            children {\n              key\n              doc_count\n              __typename\n            }\n            __typename\n          }\n          __typename\n        }\n        mediaType {\n          key\n          doc_count\n          __typename\n        }\n        contentTypeDisplayLabel {\n          key\n          doc_count\n          __typename\n        }\n        contentTypeGroupLabel {\n          key\n          doc_count\n          __typename\n        }\n        track {\n          key\n          doc_count\n          __typename\n        }\n        sessionStartTime {\n          key\n          doc_count\n          __typename\n        }\n        sessionDeliveryType {\n          key\n          doc_count\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment SearchHitFields on SearchHit {\n  contentId\n  uid\n  elapsedTime\n  contentTypeDisplayLabel\n  contentTypeGroupLabel\n  contentSourceLabel\n  meetingYear\n  meetingName\n  meetingTypeName\n  contentSourceId\n  title\n  summary\n  mediaTypes\n  firstAuthor {\n    fullNameWithDesignation\n    photoUrl {\n      path\n      target\n      title\n      fqdn\n      queryParams\n      __typename\n    }\n    role\n    __typename\n  }\n  sessionType\n  totalPresentations\n  abstract {\n    abstractNumber\n    posterBoardNumber\n    __typename\n  }\n  url {\n    path\n    target\n    title\n    fqdn\n    queryParams\n    __typename\n  }\n  dateTimePublished\n  score\n  presentations\n  primaryTrack\n  meetingAttendanceType\n  attendanceType\n  sessionDeliveryType\n  sessionAttendanceType\n  sessionLocation\n  sessionLiveBroadcastFlag\n  sessionDates {\n    start\n    end\n    timeZone\n    __typename\n  }\n  meetingId\n  isBookmarked\n  isInAgenda\n  applicationDates {\n    start\n    end\n    __typename\n  }\n  liveBroadcastUrl {\n    path\n    target\n    title\n    fqdn\n    queryParams\n    __typename\n  }\n  mediaSlides {\n    id\n    pages {\n      pageLowRes {\n        path\n        target\n        title\n        fqdn\n        queryParams\n        __typename\n      }\n      pageHiRes {\n        path\n        target\n        title\n        fqdn\n        queryParams\n        __typename\n      }\n      pagePpt {\n        path\n        target\n        title\n        fqdn\n        queryParams\n        __typename\n      }\n      previewLowRes {\n        path\n        target\n        title\n        fqdn\n        queryParams\n        __typename\n      }\n      previewHiRes {\n        path\n        target\n        title\n        fqdn\n        queryParams\n        __typename\n      }\n      pageNumber\n      __typename\n    }\n    pageTotal\n    pptDeck {\n      path\n      target\n      title\n      fqdn\n      queryParams\n      __typename\n    }\n    previewLowRes {\n      path\n      target\n      title\n      fqdn\n      queryParams\n      __typename\n    }\n    previewHiRes {\n      path\n      target\n      title\n      fqdn\n      queryParams\n      __typename\n    }\n    submitDate\n    __typename\n  }\n  __typename\n}\n"
}

headers = {
    "content-type": "application/json",
    "x-api-key": "da2-wgzqv6hk3bea3axyz6hslo33my",
    "x-datadog-origin": "rum",
    "x-datadog-parent-id": "3886352237136641415",
    "x-datadog-sampled": "1",
    "x-datadog-sampling-priority": "1",
    "x-datadog-trace-id": "4407777172597574699"
}

url = "https://api.asco.org/graphql"

r = requests.post(url, data=json.dumps(body), headers=headers)

response = r.json()['data']['search']['result']['groups']['hits']
# print(response)

for resp in response:
    rel_url = resp['url']['path']
    abs_url = f'https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium{rel_url}'
    print(abs_url)

Output:

https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14103/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14137/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14147/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14148/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14174/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14172/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14102/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14138/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14145/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14139/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14134/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14105/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14132/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14136/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14101/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14100/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14104/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14140/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14452/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14135/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14146/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14141/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14453/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14178/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14457/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14142/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14144/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14179/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14133/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14143/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14170/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14180/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14169/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14150/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14154/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14153/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14158/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14157/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14155/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14152/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14156/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14161/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14163/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14164/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14162/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14149/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14166/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14165/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14168/
https://meetings.asco.org/meetings/2022-gastrointestinal-cancers-symposium/session/14151/
  • Related