Home > Blockchain >  Issues with extracting 1 column from https://www.sbstransit.com.sg/fares-and-concessions
Issues with extracting 1 column from https://www.sbstransit.com.sg/fares-and-concessions

Time:01-30

I tried using web scarping to extract only one column from this websitebut could not extract succesfully

df = pd.read_html('https://www.sbstransit.com.sg/fares-and-concessions')
df
from urllib.request import urlopen
# from Beautifulsoup4 import BeautifulSoup
# or if you're using BeautifulSoup4:
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('https://www.sbstransit.com.sg/fares-and-concessions').read())

for row in soup('table', {'class': 'spad'})[0].tbody('tr'):
    tds = row('td')
    print(tds[0].string, tds[1].string)

I seriously need help,been trying this for hours already, its so hard just to extract 1 column :[

CodePudding user response:

What about using pandas.read_html and selecting needed table by index from list of tables:

pd.read_html('https://www.sbstransit.com.sg/fares-and-concessions', header=1)[1]

and to get only results from the column:

pd.read_html('https://www.sbstransit.com.sg/fares-and-concessions', header=1)[1]['DTL/NEL']

CodePudding user response:

What you have to do is navigate through the web site try this

from urllib.request import urlopen
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('https://www.sbstransit.com.sg/fares-and-concessions').read())

# get the first table body on the accordion
table = soup("ul", id="accordion")[0].li.table.tbody

for row in table("tr"):
    # get the 7th columm of each row
    print(row("td")[6].text)

I prefer to use scrapy we use it in my job, but if your are going to start on web scraping I recommend you to learn xpath it will help you to navigate.

  • Related