Home > Software engineering >  Parsing table data from BeautifulSoup HTML Comment
Parsing table data from BeautifulSoup HTML Comment

Time:06-16

So I am trying to get a table off of Output

any help is appriciated!

CodePudding user response:

The desired table data is in html comment.So You can invoke beautifulsoup built-in package which is Comment with lambda function to grab data.

import pandas as pd
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
url='https://www.baseball-reference.com/register/team.cgi?id=9995d2a1'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
df = pd.read_html([x for x in soup.find_all(string=lambda text: isinstance(text, Comment)) if 'id="div_team_pitching"' in x][0])[0]
print(df)

Output:

 Rk                      Name   Age  W  L   W-L%  ...    H9   HR9   BB9   SO9  SO/W  Notes
0    1.0  Logan Bursick-Harrington  21.0  0  2  0.000  ...   4.5   0.0  15.8  15.8  1.00    NaN
1    2.0                Cylis Cox*  19.0  1  0  1.000  ...  23.1   0.0   7.7  11.6  1.50    NaN
2    3.0          Travis Densmore*  21.0  0  1  0.000  ...   7.2   0.0   1.8  14.4  8.00    NaN
3    4.0             Dylan Freeman  22.0  1  0  1.000  ...  13.5   1.1   3.4  14.6  4.33    NaN
4    5.0              Zach Hopman*  22.0  0  1  0.000  ...  12.8   0.0   9.9  11.4  1.14    NaN
5    6.0            Eamon Horwedel  22.0  1  0  1.000  ...   9.0   0.0   6.4   6.4  1.00    NaN
6    7.0             Tyler Johnson  19.0  0  0    NaN  ...   5.4   0.0   2.7  10.8  4.00    NaN
7    8.0               Trent Jones  20.0  0  0    NaN  ...  14.6   1.1   2.3  12.4  5.50    NaN
8    9.0              Tanner Knapp  21.0  1  1  0.500  ...  11.6   0.0   7.7   4.8  0.63    NaN
9   10.0              Mason Majors  22.0  1  0  1.000  ...   4.9   0.0   7.4  12.3  1.67    NaN
10  11.0               Mason Meeks  21.0  0  1  0.000  ...   6.3   0.9   3.6   5.4  1.50    NaN
11  12.0            Sam Nagelvoort  19.0  0  1  0.000  ...  18.0   2.3  22.5   9.0  0.40    NaN
12  13.0              Tyler Nichol  20.0  0  0    NaN  ...  27.0   0.0  27.0   0.0  0.00    NaN
13  14.0                Cole Russo  19.0  0  0    NaN  ...  27.0  13.5   0.0   0.0   NaN    NaN
14  15.0              Kyle Salley*  22.0  0  1  0.000  ...   9.0   2.3  22.5   9.0  0.40    NaN
15  16.0               Noah Stants  21.0  0  0    NaN  ...   4.3   1.4   7.1  11.4  1.60    NaN
16  17.0         Quinn Waterhouse*  21.0  0  0    NaN  ...   4.5   0.0   4.5  18.0  4.00    NaN
17  18.0              Nick Weyrich  19.0  0  0    NaN  ...   6.4   1.3   7.7  11.6  1.50    NaN
18  19.0              Adam Wheaton  23.0  0  1  0.000  ...  11.7   1.8   4.5  12.6  2.80    NaN
19   NaN                19 Players  20.9  5  9  0.357  ...   9.2   0.8   6.9  10.7  1.55    NaN

[20 rows x 32 columns]
  • Related