Home > Back-end >  need help scraping a table; keeps repeating same row instead of printing each row
need help scraping a table; keeps repeating same row instead of printing each row

Time:10-11

I'm trying to scrape some html data and get carbon emissions in every country. When I run this code, it ends up repeating the first country and its numbers over and over again. How do I get the data for each country instead of the same one?

import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import re
html = urlopen("https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita")
soup = BeautifulSoup(html, "html.parser")
body = soup.body
world = soup.find_all("tbody")[1]
tr = body.find_all("tr")
nation = a.string
eighty = body.find_all("td")[3]
eighteen = body.find_all("td")[16]
selection =(nation, eighty.string, eighteen.string)
for tr in world:
    print(selection)

This is the link I'm trying to scrape, it's the first table:

https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita

CodePudding user response:

While importing pandas in your example it is much more simple and best practice to scrape tables:

pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita')[1]

Else you have to select your elements mor specific and put assignment of your variables into your loop:

import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

html = urlopen("https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita")
soup = BeautifulSoup(html, "html.parser")

for tr in soup.select('table:nth-of-type(2) tr:has(td)'):
    nation = tr.td.a.text
    eighty = tr.find_all("td")[3]
    eighteen = tr.find_all("td")[14]
    selection =(nation, eighty.text, eighteen.text)
    print(selection)

CodePudding user response:

Aweful lot of work when pandas can do the heavy lifting for you:

import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita'
df = pd.read_html(url)[1]

Output:

print(df)
                       Unnamed: 0 1980 1985 1990 1995  ... 2012 2013 2014 2015 2018
0                     Afghanistan  0.1  0.3  0.2  0.1  ...  0.4  0.3  0.3  0.3  0.3
1                         Albania  1.9  2.7  1.7  0.7  ...  1.7  1.7  2.0  1.6  1.6
2                         Algeria  3.4  3.2  3.0  3.3  ...  3.5  3.5  3.7  3.9  3.9
3                  American Samoa   ..   ..   ..   ..  ...  0.7  0.7  0.7  0.8   ..
4                         Andorra   ..   ..  7.5  6.7  ...  5.9  5.9  5.8  5.9  6.0
..                            ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
218  United States Virgin Islands   ..   ..   ..   ..  ...   ..   ..   ..   ..   ..
219            West Bank and Gaza   ..   ..   ..   ..  ...  0.5  0.6   ..   ..   ..
220                         Yemen  0.4  0.9  0.8  0.7  ...  0.7  1.0  0.9  0.5  0.4
221                        Zambia  0.6  0.4  0.3  0.2  ...  0.2  0.3  0.3  0.3  0.3
222                      Zimbabwe  1.3  1.2  1.5  1.3  ...  0.5  0.8  0.8  0.8  0.8

[223 rows x 15 columns]
  • Related