DataFrame not showing complete table data-CodePudding

I web-scraped some information about S&P 500 stocks from this website: https://www.slickcharts.com/sp500. The actual web-scraping bit works fine, as if I add a print statement after the for loop included, all data is displayed. In other words, the code:

# Web-scraped S&P 500 data for 500  US stocks.

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}

url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})

for row in table1.find_all('tr'):
    all_td_tags = row.find_all('td')
    if len(all_td_tags) > 0:
        company = all_td_tags[1].text
        symbol = all_td_tags[2].text
        weight = all_td_tags[3].text
        price = all_td_tags[4].text
        chg = all_td_tags[5].text
        perChg = all_td_tags[6].text
        print(company, '|', symbol, '|', weight, '|', price, '|', chg, '|', perChg)

Outputs:

Apple Inc. | AAPL | 6.866056 |    176.34 | 0.06 | (0.03%)
Microsoft Corporation | MSFT | 6.279809 |    334.50 | -0.19 | (-0.06%)
Amazon.com Inc. | AMZN | 3.729209 |    3,418.46 | -2.91 | (-0.09%)
Alphabet Inc. Class A | GOOGL | 2.208863 |    2,938.00 | -0.33 | (-0.01%)
Tesla Inc | TSLA | 2.169114 |    1,069.30 | 2.30 | (0.22%)
Alphabet Inc. Class C | GOOG | 2.056323 |    2,942.00 | -0.85 | (-0.03%)
Meta Platforms Inc. Class A | FB | 1.982391 |    336.00 | 0.76 | (0.23%)
NVIDIA Corporation | NVDA | 1.851853 |    295.60 | -0.80 | (-0.27%)
...

However, using DataFrame (I want to use it so I can search for data for a specific stock, example I type "AAPL" and I get the stock's price, weight, etc.), when writing this code:

# Web-scraped S&P 500 data for 500  US stocks.

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}

url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})

for row in table1.find_all('tr'):
    all_td_tags = row.find_all('td')
    if len(all_td_tags) > 0:
        company = all_td_tags[1].text
        symbol = all_td_tags[2].text
        weight = all_td_tags[3].text
        price = all_td_tags[4].text
        chg = all_td_tags[5].text
        perChg = all_td_tags[6].text

df = pd.DataFrame({'Company': [company], 'Symbol': [symbol], 'Weight': [weight], 'Price': [price], 'Change': [chg], 'Percent_Change': [perChg]})

print(df.head())

I only get the information for one stock, when I should get the entire table:

                    Company Symbol    Weight     Price Change Percent_Change
0  News Corporation Class B    NWS  0.006948     22.75   0.20        (0.89%)

Anything I did wrong with DataFrame so that it only displays that one stock (the stock displayed is coincidentally the last one in the table)?

UPDATE

I replaced the definition of df like so:

# Web-scraped S&P 500 data for 500  US stocks.

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}

url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})

for row in table1.find_all('tr'):
    all_td_tags = row.find_all('td')
    if len(all_td_tags) > 0:
        company = all_td_tags[1].text
        symbol = all_td_tags[2].text
        weight = all_td_tags[3].text
        price = all_td_tags[4].text
        chg = all_td_tags[5].text
        perChg = all_td_tags[6].text
        # print(company, '|', symbol, '|', weight, '|', price, '|', chg, '|', perChg)

df = pd.read_html(str(table1))[0]
print(df)

However my output looks something like this:

       #                    Company Symbol    Weight    Price   Chg     % Chg
0      1                 Apple Inc.   AAPL  6.866056   176.34  0.06   (0.03%)
1      2      Microsoft Corporation   MSFT  6.279809   334.50 -0.19  (-0.06%)
2      3            Amazon.com Inc.   AMZN  3.729209  3418.46 -2.91  (-0.09%)
3      4      Alphabet Inc. Class A  GOOGL  2.208863  2938.00 -0.33  (-0.01%)
4      5                  Tesla Inc   TSLA  2.169114  1069.30  2.30   (0.22%)
..   ...                        ...    ...       ...      ...   ...       ...
500  501     Discovery Inc. Class A  DISCA  0.009951    24.25 -0.17  (-0.70%)
501  502  Under Armour Inc. Class A    UAA  0.009792    20.62  0.00   (0.00%)
502  503                   Gap Inc.    GPS  0.008945    17.28  0.00   (0.00%)
503  504  Under Armour Inc. Class C     UA  0.008667    17.55  0.00   (0.00%)
504  505   News Corporation Class B    NWS  0.006948    22.75  0.20   (0.89%)

How do I make the second column of numbers disappear?

CodePudding user response：

Because you keep reassigning company, symbol, weight, etc. on each iteration, these variables only hold the values from the last row you parsed.

You can use pd.read_html instead. It returns a list of data frames, one for each <table> tag in the HTML snippet. There is only one table that you found through soup.find so it's element #0:

df = pd.read_html(str(table1))[0]

Output:

 #               Company Symbol   Weight   Price   Chg    % Chg
 1            Apple Inc.   AAPL 6.866056  176.34  0.06  (0.03%)
 2 Microsoft Corporation   MSFT 6.279809  334.50 -0.19 (-0.06%)
 3       Amazon.com Inc.   AMZN 3.729209 3418.46 -2.91 (-0.09%)
 4 Alphabet Inc. Class A  GOOGL 2.208863 2938.00 -0.33 (-0.01%)
 5             Tesla Inc   TSLA 2.169114 1069.30  2.30  (0.22%)
...

Trim and rename the frame as needed.