Home > other >  Question on web-scraping hyperlinks with element criteria using python on tennisexplorer.com
Question on web-scraping hyperlinks with element criteria using python on tennisexplorer.com

Time:11-14

The problem I have with the code below is it prints all of the a-href stuff, I want to know how to change it so that it only prints the hyperlinks found in "info" on the far right of the tables on the webpage "https://www.tennisexplorer.com/results/?type=atp-single&year=2022&month=09&day=08".

import requests
from bs4 import BeautifulSoup
import pandas as pd

response = requests.get('https://www.tennisexplorer.com/results/?type=atp-single&year=2022&month=09&day=08')
webpage = response.content
soup = BeautifulSoup(response.text, "html.parser")

col1 = [a.get('href') for a in soup.find_all('a')]

print(pd.DataFrame({"MatchLink":col1}))

CodePudding user response:

Here is a way to get a dataframe (with one column) with urls pointing to every match detail:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

url = 'https://www.tennisexplorer.com/results/?type=atp-single&year=2022&month=09&day=08'

big_list = []
r = requests.get(url)
soup = bs(r.text, 'html.parser')
links = soup.select('table tbody tr td:last-child a')
for l in links:
    big_list.append(l.get('href'))
df = pd.DataFrame(big_list, columns = ['Url'])
print(df)

Result in terminal:

Url
0   /match-detail/?id=2188301
1   /match-detail/?id=2188937
2   /match-detail/?id=2188807
3   /match-detail/?id=2188867
4   /match-detail/?id=2188869
... ...
131 /results/?year=2022&month=09&day=11
132 /results/?year=2022&month=09&day=18
133 /results/?year=2022&month=09&day=25
134 /results/?year=2022&month=10&day=02
135 /results/?year=2022&month=10&day=09
136 rows × 1 columns

You can further filter those urls based on text, for example (if you only want the info links, you did not clarify it)

See BeautifulSoup documentation here: https://beautiful-soup-4.readthedocs.io/en/latest/

CodePudding user response:

You could add conditions in the list comprehension

col1 = [
    a.get('href') for a in soup.find_all('a') if a.parent.name == 'td' 
    and a.string == 'info' and not a.parent.find_next_sibling()
]

or you could just get specific with a selector

col1 = [a.get('href') for a in soup.select('td:last-child a[title="Click for match detail"][href]')]

Both return the same list of links

  0   https://www.tennisexplorer.com/match-detail/?id=2188301
  1   https://www.tennisexplorer.com/match-detail/?id=2188937
  2   https://www.tennisexplorer.com/match-detail/?id=2188807
  3   https://www.tennisexplorer.com/match-detail/?id=2188867
  4   https://www.tennisexplorer.com/match-detail/?id=2188869
  5   https://www.tennisexplorer.com/match-detail/?id=2189045
  6   https://www.tennisexplorer.com/match-detail/?id=2189097
  7   https://www.tennisexplorer.com/match-detail/?id=2188871
  8   https://www.tennisexplorer.com/match-detail/?id=2189083
  9   https://www.tennisexplorer.com/match-detail/?id=2188569
 10   https://www.tennisexplorer.com/match-detail/?id=2188327
 11   https://www.tennisexplorer.com/match-detail/?id=2188369
 12   https://www.tennisexplorer.com/match-detail/?id=2188319
 13   https://www.tennisexplorer.com/match-detail/?id=2188741
 14   https://www.tennisexplorer.com/match-detail/?id=2188809
 15   https://www.tennisexplorer.com/match-detail/?id=2188789
 16   https://www.tennisexplorer.com/match-detail/?id=2188911
 17   https://www.tennisexplorer.com/match-detail/?id=2188875
 18   https://www.tennisexplorer.com/match-detail/?id=2188991
 19   https://www.tennisexplorer.com/match-detail/?id=2188813
 20   https://www.tennisexplorer.com/match-detail/?id=2188803
 21   https://www.tennisexplorer.com/match-detail/?id=2189015
 22   https://www.tennisexplorer.com/match-detail/?id=2189035
 23   https://www.tennisexplorer.com/match-detail/?id=2188989
 24   https://www.tennisexplorer.com/match-detail/?id=2189171
 25   https://www.tennisexplorer.com/match-detail/?id=2188853
 26   https://www.tennisexplorer.com/match-detail/?id=2188523
 27   https://www.tennisexplorer.com/match-detail/?id=2189073
 28   https://www.tennisexplorer.com/match-detail/?id=2189055
 29   https://www.tennisexplorer.com/match-detail/?id=2188967
 30   https://www.tennisexplorer.com/match-detail/?id=2188887
 31   https://www.tennisexplorer.com/match-detail/?id=2188795
 32   https://www.tennisexplorer.com/match-detail/?id=2188851
 33   https://www.tennisexplorer.com/match-detail/?id=2188943
 34   https://www.tennisexplorer.com/match-detail/?id=2188777
 35   https://www.tennisexplorer.com/match-detail/?id=2188883
 36   https://www.tennisexplorer.com/match-detail/?id=2189005
 37   https://www.tennisexplorer.com/match-detail/?id=2188827
 38   https://www.tennisexplorer.com/match-detail/?id=2188773
 39   https://www.tennisexplorer.com/match-detail/?id=2188941
 40   https://www.tennisexplorer.com/match-detail/?id=2188885
 41   https://www.tennisexplorer.com/match-detail/?id=2188765
 42   https://www.tennisexplorer.com/match-detail/?id=2188337
 43   https://www.tennisexplorer.com/match-detail/?id=2188521
 44   https://www.tennisexplorer.com/match-detail/?id=2188963
 45   https://www.tennisexplorer.com/match-detail/?id=2187855
 46   https://www.tennisexplorer.com/match-detail/?id=2188771
 47   https://www.tennisexplorer.com/match-detail/?id=2188729
 48   https://www.tennisexplorer.com/match-detail/?id=2188637
 49   https://www.tennisexplorer.com/match-detail/?id=2188635
 50   https://www.tennisexplorer.com/match-detail/?id=2188657
 51   https://www.tennisexplorer.com/match-detail/?id=2188753
 52   https://www.tennisexplorer.com/match-detail/?id=2188731
 53   https://www.tennisexplorer.com/match-detail/?id=2188751
 54   https://www.tennisexplorer.com/match-detail/?id=2188727
 55   https://www.tennisexplorer.com/match-detail/?id=2188947
 56   https://www.tennisexplorer.com/match-detail/?id=2188601
 57   https://www.tennisexplorer.com/match-detail/?id=2188907
 58   https://www.tennisexplorer.com/match-detail/?id=2189033
 59   https://www.tennisexplorer.com/match-detail/?id=2189085
 60   https://www.tennisexplorer.com/match-detail/?id=2189089
 61   https://www.tennisexplorer.com/match-detail/?id=2188997
 62   https://www.tennisexplorer.com/match-detail/?id=2188515
 63   https://www.tennisexplorer.com/match-detail/?id=2188571
 64   https://www.tennisexplorer.com/match-detail/?id=2189087
 65   https://www.tennisexplorer.com/match-detail/?id=2188605
 66   https://www.tennisexplorer.com/match-detail/?id=2189067
 67   https://www.tennisexplorer.com/match-detail/?id=2189091
 68   https://www.tennisexplorer.com/match-detail/?id=2188877
 69   https://www.tennisexplorer.com/match-detail/?id=2188849
 70   https://www.tennisexplorer.com/match-detail/?id=2188889
 71   https://www.tennisexplorer.com/match-detail/?id=2188841
 72   https://www.tennisexplorer.com/match-detail/?id=2188955
 73   https://www.tennisexplorer.com/match-detail/?id=2189001
 74   https://www.tennisexplorer.com/match-detail/?id=2188891
 75   https://www.tennisexplorer.com/match-detail/?id=2188843
 76   https://www.tennisexplorer.com/match-detail/?id=2188999
 77   https://www.tennisexplorer.com/match-detail/?id=2188939
 78   https://www.tennisexplorer.com/match-detail/?id=2188983
 79   https://www.tennisexplorer.com/match-detail/?id=2188341
 80   https://www.tennisexplorer.com/match-detail/?id=2189011
 81   https://www.tennisexplorer.com/match-detail/?id=2189009
 82   https://www.tennisexplorer.com/match-detail/?id=2188899
 83   https://www.tennisexplorer.com/match-detail/?id=2188829
 84   https://www.tennisexplorer.com/match-detail/?id=2188903
 85   https://www.tennisexplorer.com/match-detail/?id=2188797
 86   https://www.tennisexplorer.com/match-detail/?id=2188775
 87   https://www.tennisexplorer.com/match-detail/?id=2188791
 88   https://www.tennisexplorer.com/match-detail/?id=2188905
 89   https://www.tennisexplorer.com/match-detail/?id=2188811
 90   https://www.tennisexplorer.com/match-detail/?id=2188845
 91   https://www.tennisexplorer.com/match-detail/?id=2188767
 92   https://www.tennisexplorer.com/match-detail/?id=2188749
 93   https://www.tennisexplorer.com/match-detail/?id=2188517
 94   https://www.tennisexplorer.com/match-detail/?id=2188799
 95   https://www.tennisexplorer.com/match-detail/?id=2188817
 96   https://www.tennisexplorer.com/match-detail/?id=2188897
 97   https://www.tennisexplorer.com/match-detail/?id=2188847
 98   https://www.tennisexplorer.com/match-detail/?id=2188879
 99   https://www.tennisexplorer.com/match-detail/?id=2188901
100   https://www.tennisexplorer.com/match-detail/?id=2188545
101   https://www.tennisexplorer.com/match-detail/?id=2188839
102   https://www.tennisexplorer.com/match-detail/?id=2188343
103   https://www.tennisexplorer.com/match-detail/?id=2188821
104   https://www.tennisexplorer.com/match-detail/?id=2188547
105   https://www.tennisexplorer.com/match-detail/?id=2188801
106   https://www.tennisexplorer.com/match-detail/?id=2188893
107   https://www.tennisexplorer.com/match-detail/?id=2188769
108   https://www.tennisexplorer.com/match-detail/?id=2188519
109   https://www.tennisexplorer.com/match-detail/?id=2188815
110   https://www.tennisexplorer.com/match-detail/?id=2189125
111   https://www.tennisexplorer.com/match-detail/?id=2189123
112   https://www.tennisexplorer.com/match-detail/?id=2189121
113   https://www.tennisexplorer.com/match-detail/?id=2190121
114   https://www.tennisexplorer.com/match-detail/?id=2189115
115   https://www.tennisexplorer.com/match-detail/?id=2189117
116   https://www.tennisexplorer.com/match-detail/?id=2189127
117   https://www.tennisexplorer.com/match-detail/?id=2190175
118   https://www.tennisexplorer.com/match-detail/?id=2189109
119   https://www.tennisexplorer.com/match-detail/?id=2189111
120   https://www.tennisexplorer.com/match-detail/?id=2189113
121   https://www.tennisexplorer.com/match-detail/?id=2188859
122   https://www.tennisexplorer.com/match-detail/?id=2188857
123   https://www.tennisexplorer.com/match-detail/?id=2188861
124   https://www.tennisexplorer.com/match-detail/?id=2189155
125   https://www.tennisexplorer.com/match-detail/?id=2189157
126   https://www.tennisexplorer.com/match-detail/?id=2189153
127   https://www.tennisexplorer.com/match-detail/?id=2189151
128   https://www.tennisexplorer.com/match-detail/?id=2189149
129   https://www.tennisexplorer.com/match-detail/?id=2189079

[printed with print('\n'.join(f'{i:>3} https://www.tennisexplorer.com{a}' for i, a in enumerate(col1)))]

  • Related