Home > OS >  How do I separate this 1 item list into multiple lists?
How do I separate this 1 item list into multiple lists?

Time:07-07

from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
table_body = soup.findAll('tbody', class_ = lambda table_rows: table_rows != "thead")
table_data = [[td.getText() for td in table_body[i].findAll('td')]
                for i in range(len(table_body))]

I'm working on a project that will scrape data off of https://www.pro-football-reference.com/years/2021/passing.htm. My code to scrape the table headers works however I am having a lot of trouble formatting the table body in a way that will separate player stats into rows. When I run print(table_data) my result is a one item list that prints the following:

[['Tom Brady*', 'TAM', '44', 'QB', '17', '17', '13-4-0', '485', '719', '67.5', '5316', '43', '6', '12', '1.7', '269', '62', '7.4', '7.8', '11.0', '312.7', '102.1', '68.1', '22', '144', '3', '6.98', '7.41', '3', '5', 'Justin Herbert*', 'LAC', '23', 'QB', '17', '17', '9-8-0', '443', '672', '65.9', '5014', '38', '5.7', '15', '2.2', '256', '72', '7.5', '7.6', '11.3', '294.9', '97.7', '65.6', '31', '214', '4.4', '6.83', '6.95', '5', '5', 'Matthew Stafford', 'LAR', '33', 'QB', '17', '17', '12-5-0', '404', '601', '67.2', '4886', '41', '6.8', '17', '2.8', '233', '79', '8.1', '8.2', '12.1', '287.4', '102.9', '63.8', '30', '243', '4.8', '7.36', '7.45', '3', '4',....]]

How do separate this one item list into multiple lists so that I can achieve my desired output:

[
['Tom Brady*', 'TAM', '44', 'QB', '17', '17', '13-4-0', '485', '719', '67.5', '5316', '43', '6', '12', '1.7', '269', '62', '7.4', '7.8', '11.0', '312.7', '102.1', '68.1', '22', '144', '3', '6.98', '7.41', '3', '5']
['Justin Herbert*', 'LAC', '23', 'QB', '17', '17', '9-8-0', '443', '672', '65.9', '5014', '38', '5.7', '15', '2.2', '256', '72', '7.5', '7.6', '11.3', '294.9', '97.7', '65.6', '31', '214', '4.4', '6.83', '6.95', '5', '5']
['Matthew Stafford', 'LAR', '33', 'QB', '17', '17', '12-5-0', '404', '601', '67.2', '4886', '41', '6.8', '17', '2.8', '233', '79', '8.1', '8.2', '12.1', '287.4', '102.9', '63.8', '30', '243', '4.8', '7.36', '7.45', '3', '4']
['Patrick Mahomes'...]
['Derek Carr'...]
]

CodePudding user response:

Iterate the rows of the table and for each one over its <td> to get its text:

[[e.text for e in r.select('td')] for row in soup.select('tbody tr')]

Output:

[['Tom Brady*', 'TAM', '44', 'QB', '17', '17', '13-4-0', '485', '719', '67.5', '5316', '43', '6', '12', '1.7', '269', '62', '7.4', '7.8', '11.0', '312.7', '102.1', '68.1', '22', '144', '3', '6.98', '7.41', '3', '5'], ['Justin Herbert*', 'LAC', '23', 'QB', '17', '17', '9-8-0', '443', '672', '65.9', '5014', '38', '5.7', '15', '2.2', '256', '72', '7.5', '7.6', '11.3', '294.9', '97.7', '65.6', '31', '214', '4.4', '6.83', '6.95', '5', '5'], ['Matthew Stafford', 'LAR', '33', 'QB', '17', '17', '12-5-0', '404', '601', '67.2', '4886', '41', '6.8', '17', '2.8', '233', '79', '8.1', '8.2', '12.1', '287.4', '102.9', '63.8', '30', '243', '4.8', '7.36', '7.45', '3', '4'], ['Patrick Mahomes*', 'KAN', '26', 'QB', '17', '17', '12-5-0', '436', '658', '66.3', '4839', '37', '5.6', '13', '2', '260', '75', '7.4', '7.6', '11.1', '284.6', '98.5', '62.2', '28', '146', '4.1', '6.84', '7.07', '3', '3'], ['Derek Carr', 'LVR', '30', 'QB', '17', '17', '10-7-0', '428', '626', '68.4', '4804', '23', '3.7', '14', '2.2', '217', '61', '7.7', '7.4', '11.2', '282.6', '94.0', '52.4', '40', '241', '6', '6.85', '6.60', '3', '6'], ['Joe Burrow', 'CIN', '25', 'QB', '16', '16', '10-6-0', '366', '520', '70.4', '4611', '34', '6.5', '14', '2.7', '202', '82', '8.9', '9.0', '12.6', '288.2', '108.3', '54.3', '51', '370', '8.9', '7.43', '7.51', '2', '3'], ['Dak Prescott', 'DAL', '28', 'QB', '16', '16', '11-5-0', '410', '596', '68.8', '4449', '37', '6.2', '10', '1.7', '227', '51', '7.5', '8.0', '10.9', '278.1', '104.2', '54.6', '30', '144', '4.8', '6.88', '7.34', '1', '2'], ['Josh Allen', 'BUF', '25', 'QB', '17', '17', '11-6-0', '409', '646', '63.3', '4407', '36', '5.6', '15', '2.3', '234', '61', '6.8', '6.9', '10.8', '259.2', '92.2', '60.7', '26', '164', '3.9', '6.31', '6.38', '', ''], ['Kirk Cousins*', 'MIN', '33', 'QB', '16', '16', '8-8-0', '372', '561', '66.3', '4221', '33', '5.9', '7', '1.2', '192', '64', '7.5', '8.1', '11.3', '263.8', '103.1', '52.3', '28', '197', '4.8', '6.83', '7.42', '3', '4'], ['Aaron Rodgers* ', 'GNB', '38', 'QB', '16', '16', '13-3-0', '366', '531', '68.9', '4115', '37', '7', '4', '0.8', '213', '75', '7.7', '8.8', '11.2', '257.2', '111.9', '69.1', '30', '188', '5.3', '7.00', '8.00', '1', '2'], ['Matt Ryan', 'ATL', '36', 'QB', '17', '17', '7-10-0', '375', '560', '67', '3968', '20', '3.6', '12', '2.1', '195', '64', '7.1', '6.8', '10.6', '233.4', '90.4', '46.1', '40', '274', '6.7', '6.16', '5.92', '3', '4'], ['Jimmy Garoppolo', 'SFO', '30', 'QB', '15', '15', '9-6-0', '301', '441', '68.3', '3810', '20', '4.5', '12', '2.7', '172', '83', '8.6', '8.3', '12.7', '254.0', '98.7', '53.3', '29', '201', '6.2', '7.68', '7.38', '3', '3'],...]

Just to point out an alternative with pandas.read_html(), that would be an easy and common way for that tasks, while using beautifulsoup under the hood for you.

Example
import pandas as pd

#read the first table from url into dataframe
df = pd.read_html('https://www.pro-football-reference.com/years/2021/passing.htm')[0]
#select only rows that are not subheaders
df[df['Rk'] != 'Rk'] 
Output
Rk Player Tm Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% 1D Lng Y/A AY/A Y/C Y/G Rate QBR Sk Yds.1 Sk% NY/A ANY/A 4QC GWD
1 Tom Brady* TAM 44 QB 17 17 13-4-0 485 719 67.5 5316 43 6 12 1.7 269 62 7.4 7.8 11 312.7 102.1 68.1 22 144 3 6.98 7.41 3 5
2 Justin Herbert* LAC 23 QB 17 17 9-8-0 443 672 65.9 5014 38 5.7 15 2.2 256 72 7.5 7.6 11.3 294.9 97.7 65.6 31 214 4.4 6.83 6.95 5 5
3 Matthew Stafford LAR 33 QB 17 17 12-5-0 404 601 67.2 4886 41 6.8 17 2.8 233 79 8.1 8.2 12.1 287.4 102.9 63.8 30 243 4.8 7.36 7.45 3 4
4 Patrick Mahomes* KAN 26 QB 17 17 12-5-0 436 658 66.3 4839 37 5.6 13 2 260 75 7.4 7.6 11.1 284.6 98.5 62.2 28 146 4.1 6.84 7.07 3 3
5 Derek Carr LVR 30 QB 17 17 10-7-0 428 626 68.4 4804 23 3.7 14 2.2 217 61 7.7 7.4 11.2 282.6 94 52.4 40 241 6 6.85 6.6 3 6
6 Joe Burrow CIN 25 QB 16 16 10-6-0 366 520 70.4 4611 34 6.5 14 2.7 202 82 8.9 9 12.6 288.2 108.3 54.3 51 370 8.9 7.43 7.51 2 3
7 Dak Prescott DAL 28 QB 16 16 11-5-0 410 596 68.8 4449 37 6.2 10 1.7 227 51 7.5 8 10.9 278.1 104.2 54.6 30 144 4.8 6.88 7.34 1 2

...

  • Related