Home > Software engineering >  How to scrape multiple tables with same name?
How to scrape multiple tables with same name?

Time:11-26

I am trying to scrape a site where the table classes have the same name.

There are 3 types of tables and I want to get the headers just once then get all the information from all three tables into a xlsx file. Website = https://wiki.warthunder.com/List_of_vehicle_battle_ratings

running the code with vehical = soup.find('table') works. But I only get the first tables information. I've tried changing it into vehical = soup.find_all('table')

But that gives me this error.

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Here is my full code:

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

def updatebr():
    url='https://wiki.warthunder.com/List_of_vehicle_battle_ratings'
    headers =[]
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    vehical = soup.find('table')
    

    for i in vehical.find_all('th'):
        title = i.text
        headers.append(title)

    df = pd.DataFrame(columns = headers)

    for row in vehical.find_all('tr')[1:]:
        data = row.find_all('td')
        row_data = [td.text for td in data]
        length = len(df)
        df.loc[length] = row_data


    df.to_excel('brlist.xlsx')

Full Error Code:

Traceback (most recent call last):
  File "c:\Python\WT\BRtest.py", line 35, in <module>
    updatebr()
  File "c:\Python\WT\BRtest.py", line 24, in updatebr
    test = vehical.find_all('tr')
  File "C:\lib\site-packages\bs4\element.py", line 2289, in __getattr__
    raise AttributeError(
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
enter code here

CodePudding user response:

Make it more simple, since you already involve pandas - This wil pd.read_html() all tables in a list an pd.concat() them to a single one:

pd.concat(
    pd.read_html(
        'https://wiki.warthunder.com/List_of_vehicle_battle_ratings',
        attrs={'class':'wikitable'}
    ),
    ignore_index=True
).to_excel('brlist.xlsx')
country type name ab rb sb
0 Italy Utility helicopter A.109EOA-2 8.7 9 9.3
1 Italy Attack helicopter A-129 International (p) 9.7 10 9.7
... ... ... ... ... ... ...
1945 USSR Frigate Rosomacha 4 4 4
1946 USSR Motor gun boat Ya-5M 1.3 1.3 1.3

However to answer your question - Since using vehical = soup.find_all('table') you have to performe an additional loop iterating the ResultSet. Used stripped_strings here to simplify.

...
url='https://wiki.warthunder.com/List_of_vehicle_battle_ratings'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
vehical = soup.select('table.wikitable')

pd.DataFrame(
    [list(row.stripped_strings)
     for t in vehical 
     for row in t.select('tr:has(td)')
    ],
    columns=list(soup.table.tr.stripped_strings)
).to_excel('brlist.xlsx')
  • Related