Home > Enterprise >  Only getting the last data point of a web page when using my web scraping code
Only getting the last data point of a web page when using my web scraping code

Time:01-13

It is probably a newbie problem but I cannot solve it. Found couple of different web scraping codes on youtube tutorials, but each one of them gives me only the last data point, and not a list of all of them as I want to get. This is my code(using jupyter notebook):

import requests

html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')

for country in countrys:
    country_name = country.find('h3',class_='country-name').text.strip()
    capital = country.find('span',class_='country-capital').text
    population = country.find('span',class_='country-population').text
    data = [country_name, capital, population]

print(data)

Result:

['Zimbabwe', 'Harare', '11651858']

Therefore, only last value of the data(country list) is a result of a code. How can I get the list of all the data?

CodePudding user response:

You have to create data variable as a list outside the loop and append records to the list:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')

data = []  # <- HERE

for country in countrys:
    country_name = country.find('h3',class_='country-name').text.strip()
    capital = country.find('span',class_='country-capital').text
    population = country.find('span',class_='country-population').text
    data.append([country_name, capital, population])  # <- HERE

print(data)

Output:

[['Andorra', 'Andorra la Vella', '84000'],
 ['United Arab Emirates', 'Abu Dhabi', '4975593'],
 ['Afghanistan', 'Kabul', '29121286'],
 ['Antigua and Barbuda', "St. John's", '86754'],
 ['Anguilla', 'The Valley', '13254'],
 ['Albania', 'Tirana', '2986952'],
 ['Armenia', 'Yerevan', '2968000'],
 ['Angola', 'Luanda', '13068161'],
 ['Antarctica', 'None', '0'],
 ['Argentina', 'Buenos Aires', '41343201'],
 ['American Samoa', 'Pago Pago', '57881'],
 ['Austria', 'Vienna', '8205000'],
 ['Australia', 'Canberra', '21515754'],
 ['Aruba', 'Oranjestad', '71566'],
 ['Åland', 'Mariehamn', '26711'],
 ['Azerbaijan', 'Baku', '8303512'],
 ['Bosnia and Herzegovina', 'Sarajevo', '4590000'],
 ['Barbados', 'Bridgetown', '285653'],
 ['Bangladesh', 'Dhaka', '156118464'],
 ['Belgium', 'Brussels', '10403000'],
 ['Burkina Faso', 'Ouagadougou', '16241811'],
 ['Bulgaria', 'Sofia', '7148785'],
 ['Bahrain', 'Manama', '738004'],
 ['Burundi', 'Bujumbura', '9863117'],
 ['Benin', 'Porto-Novo', '9056010'],
 ['Saint Barthélemy', 'Gustavia', '8450'],
 ['Bermuda', 'Hamilton', '65365'],
 ['Brunei', 'Bandar Seri Begawan', '395027'],
 ['Bolivia', 'Sucre', '9947418'],
 ['Bonaire', 'Kralendijk', '18012'],
 ['Brazil', 'Brasília', '201103330'],
 ['Bahamas', 'Nassau', '301790'],
 ['Bhutan', 'Thimphu', '699847'],
 ['Bouvet Island', 'None', '0'],
 ['Botswana', 'Gaborone', '2029307'],
 ['Belarus', 'Minsk', '9685000'],
 ['Belize', 'Belmopan', '314522'],
 ['Canada', 'Ottawa', '33679000'],
 ['Cocos [Keeling] Islands', 'West Island', '628'],
 ['Democratic Republic of the Congo', 'Kinshasa', '70916439'],
 ['Central African Republic', 'Bangui', '4844927'],
 ['Republic of the Congo', 'Brazzaville', '3039126'],
 ['Switzerland', 'Bern', '7581000'],
 ['Ivory Coast', 'Yamoussoukro', '21058798'],
 ['Cook Islands', 'Avarua', '21388'],
 ['Chile', 'Santiago', '16746491'],
 ['Cameroon', 'Yaoundé', '19294149'],
 ['China', 'Beijing', '1330044000'],
 ['Colombia', 'Bogotá', '47790000'],
 ['Costa Rica', 'San José', '4516220'],
 ['Cuba', 'Havana', '11423000'],
 ['Cape Verde', 'Praia', '508659'],
 ['Curacao', 'Willemstad', '141766'],
 ['Christmas Island', 'Flying Fish Cove', '1500'],
 ['Cyprus', 'Nicosia', '1102677'],
 ['Czech Republic', 'Prague', '10476000'],
 ['Germany', 'Berlin', '81802257'],
 ['Djibouti', 'Djibouti', '740528'],
 ['Denmark', 'Copenhagen', '5484000'],
 ['Dominica', 'Roseau', '72813'],
 ['Dominican Republic', 'Santo Domingo', '9823821'],
 ['Algeria', 'Algiers', '34586184'],
 ['Ecuador', 'Quito', '14790608'],
 ['Estonia', 'Tallinn', '1291170'],
 ['Egypt', 'Cairo', '80471869'],
 ['Western Sahara', 'Laâyoune / El Aaiún', '273008'],
 ['Eritrea', 'Asmara', '5792984'],
 ['Spain', 'Madrid', '46505963'],
 ['Ethiopia', 'Addis Ababa', '88013491'],
 ['Finland', 'Helsinki', '5244000'],
 ['Fiji', 'Suva', '875983'],
 ['Falkland Islands', 'Stanley', '2638'],
 ['Micronesia', 'Palikir', '107708'],
 ['Faroe Islands', 'Tórshavn', '48228'],
 ['France', 'Paris', '64768389'],
 ['Gabon', 'Libreville', '1545255'],
 ['United Kingdom', 'London', '62348447'],
 ['Grenada', "St. George's", '107818'],
 ['Georgia', 'Tbilisi', '4630000'],
 ['French Guiana', 'Cayenne', '195506'],
 ['Guernsey', 'St Peter Port', '65228'],
 ['Ghana', 'Accra', '24339838'],
 ['Gibraltar', 'Gibraltar', '27884'],
 ['Greenland', 'Nuuk', '56375'],
 ['Gambia', 'Bathurst', '1593256'],
 ['Guinea', 'Conakry', '10324025'],
 ['Guadeloupe', 'Basse-Terre', '443000'],
 ['Equatorial Guinea', 'Malabo', '1014999'],
 ['Greece', 'Athens', '11000000'],
 ['South Georgia and the South Sandwich Islands', 'Grytviken', '30'],
 ['Guatemala', 'Guatemala City', '13550440'],
 ['Guam', 'Hagåtña', '159358'],
 ['Guinea-Bissau', 'Bissau', '1565126'],
 ['Guyana', 'Georgetown', '748486'],
 ['Hong Kong', 'Hong Kong', '6898686'],
 ['Heard Island and McDonald Islands', 'None', '0'],
 ['Honduras', 'Tegucigalpa', '7989415'],
 ['Croatia', 'Zagreb', '4491000'],
 ['Haiti', 'Port-au-Prince', '9648924'],
 ['Hungary', 'Budapest', '9982000'],
 ['Indonesia', 'Jakarta', '242968342'],
 ['Ireland', 'Dublin', '4622917'],
 ['Israel', 'None', '7353985'],
 ['Isle of Man', 'Douglas', '75049'],
 ['India', 'New Delhi', '1173108018'],
 ['British Indian Ocean Territory', 'None', '4000'],
 ['Iraq', 'Baghdad', '29671605'],
 ['Iran', 'Tehran', '76923300'],
 ['Iceland', 'Reykjavik', '308910'],
 ['Italy', 'Rome', '60340328'],
 ['Jersey', 'Saint Helier', '90812'],
 ['Jamaica', 'Kingston', '2847232'],
 ['Jordan', 'Amman', '6407085'],
 ['Japan', 'Tokyo', '127288000'],
 ['Kenya', 'Nairobi', '40046566'],
 ['Kyrgyzstan', 'Bishkek', '5776500'],
 ['Cambodia', 'Phnom Penh', '14453680'],
 ['Kiribati', 'Tarawa', '92533'],
 ['Comoros', 'Moroni', '773407'],
 ['Saint Kitts and Nevis', 'Basseterre', '51134'],
 ['North Korea', 'Pyongyang', '22912177'],
 ['South Korea', 'Seoul', '48422644'],
 ['Kuwait', 'Kuwait City', '2789132'],
 ['Cayman Islands', 'George Town', '44270'],
 ['Kazakhstan', 'Astana', '15340000'],
 ['Laos', 'Vientiane', '6368162'],
 ['Lebanon', 'Beirut', '4125247'],
 ['Saint Lucia', 'Castries', '160922'],
 ['Liechtenstein', 'Vaduz', '35000'],
 ['Sri Lanka', 'Colombo', '21513990'],
 ['Liberia', 'Monrovia', '3685076'],
 ['Lesotho', 'Maseru', '1919552'],
 ['Lithuania', 'Vilnius', '2944459'],
 ['Luxembourg', 'Luxembourg', '497538'],
 ['Latvia', 'Riga', '2217969'],
 ['Libya', 'Tripoli', '6461454'],
 ['Morocco', 'Rabat', '31627428'],
 ['Monaco', 'Monaco', '32965'],
 ['Moldova', 'Chişinău', '4324000'],
 ['Montenegro', 'Podgorica', '666730'],
 ['Saint Martin', 'Marigot', '35925'],
 ['Madagascar', 'Antananarivo', '21281844'],
 ['Marshall Islands', 'Majuro', '65859'],
 ['Macedonia', 'Skopje', '2062294'],
 ['Mali', 'Bamako', '13796354'],
 ['Myanmar [Burma]', 'Naypyitaw', '53414374'],
 ['Mongolia', 'Ulan Bator', '3086918'],
 ['Macao', 'Macao', '449198'],
 ['Northern Mariana Islands', 'Saipan', '53883'],
 ['Martinique', 'Fort-de-France', '432900'],
 ['Mauritania', 'Nouakchott', '3205060'],
 ['Montserrat', 'Plymouth', '9341'],
 ['Malta', 'Valletta', '403000'],
 ['Mauritius', 'Port Louis', '1294104'],
 ['Maldives', 'Malé', '395650'],
 ['Malawi', 'Lilongwe', '15447500'],
 ['Mexico', 'Mexico City', '112468855'],
 ['Malaysia', 'Kuala Lumpur', '28274729'],
 ['Mozambique', 'Maputo', '22061451'],
 ['Namibia', 'Windhoek', '2128471'],
 ['New Caledonia', 'Noumea', '216494'],
 ['Niger', 'Niamey', '15878271'],
 ['Norfolk Island', 'Kingston', '1828'],
 ['Nigeria', 'Abuja', '154000000'],
 ['Nicaragua', 'Managua', '5995928'],
 ['Netherlands', 'Amsterdam', '16645000'],
 ['Norway', 'Oslo', '5009150'],
 ['Nepal', 'Kathmandu', '28951852'],
 ['Nauru', 'Yaren', '10065'],
 ['Niue', 'Alofi', '2166'],
 ['New Zealand', 'Wellington', '4252277'],
 ['Oman', 'Muscat', '2967717'],
 ['Panama', 'Panama City', '3410676'],
 ['Peru', 'Lima', '29907003'],
 ['French Polynesia', 'Papeete', '270485'],
 ['Papua New Guinea', 'Port Moresby', '6064515'],
 ['Philippines', 'Manila', '99900177'],
 ['Pakistan', 'Islamabad', '184404791'],
 ['Poland', 'Warsaw', '38500000'],
 ['Saint Pierre and Miquelon', 'Saint-Pierre', '7012'],
 ['Pitcairn Islands', 'Adamstown', '46'],
 ['Puerto Rico', 'San Juan', '3916632'],
 ['Palestine', 'None', '3800000'],
 ['Portugal', 'Lisbon', '10676000'],
 ['Palau', 'Melekeok', '19907'],
 ['Paraguay', 'Asunción', '6375830'],
 ['Qatar', 'Doha', '840926'],
 ['Réunion', 'Saint-Denis', '776948'],
 ['Romania', 'Bucharest', '21959278'],
 ['Serbia', 'Belgrade', '7344847'],
 ['Russia', 'Moscow', '140702000'],
 ['Rwanda', 'Kigali', '11055976'],
 ['Saudi Arabia', 'Riyadh', '25731776'],
 ['Solomon Islands', 'Honiara', '559198'],
 ['Seychelles', 'Victoria', '88340'],
 ['Sudan', 'Khartoum', '35000000'],
 ['Sweden', 'Stockholm', '9828655'],
 ['Singapore', 'Singapore', '4701069'],
 ['Saint Helena', 'Jamestown', '7460'],
 ['Slovenia', 'Ljubljana', '2007000'],
 ['Svalbard and Jan Mayen', 'Longyearbyen', '2550'],
 ['Slovakia', 'Bratislava', '5455000'],
 ['Sierra Leone', 'Freetown', '5245695'],
 ['San Marino', 'San Marino', '31477'],
 ['Senegal', 'Dakar', '12323252'],
 ['Somalia', 'Mogadishu', '10112453'],
 ['Suriname', 'Paramaribo', '492829'],
 ['South Sudan', 'Juba', '8260490'],
 ['São Tomé and Príncipe', 'São Tomé', '175808'],
 ['El Salvador', 'San Salvador', '6052064'],
 ['Sint Maarten', 'Philipsburg', '37429'],
 ['Syria', 'Damascus', '22198110'],
 ['Swaziland', 'Mbabane', '1354051'],
 ['Turks and Caicos Islands', 'Cockburn Town', '20556'],
 ['Chad', "N'Djamena", '10543464'],
 ['French Southern Territories', 'Port-aux-Français', '140'],
 ['Togo', 'Lomé', '6587239'],
 ['Thailand', 'Bangkok', '67089500'],
 ['Tajikistan', 'Dushanbe', '7487489'],
 ['Tokelau', 'None', '1466'],
 ['East Timor', 'Dili', '1154625'],
 ['Turkmenistan', 'Ashgabat', '4940916'],
 ['Tunisia', 'Tunis', '10589025'],
 ['Tonga', "Nuku'alofa", '122580'],
 ['Turkey', 'Ankara', '77804122'],
 ['Trinidad and Tobago', 'Port of Spain', '1228691'],
 ['Tuvalu', 'Funafuti', '10472'],
 ['Taiwan', 'Taipei', '22894384'],
 ['Tanzania', 'Dodoma', '41892895'],
 ['Ukraine', 'Kiev', '45415596'],
 ['Uganda', 'Kampala', '33398682'],
 ['U.S. Minor Outlying Islands', 'None', '0'],
 ['United States', 'Washington', '310232863'],
 ['Uruguay', 'Montevideo', '3477000'],
 ['Uzbekistan', 'Tashkent', '27865738'],
 ['Vatican City', 'Vatican City', '921'],
 ['Saint Vincent and the Grenadines', 'Kingstown', '104217'],
 ['Venezuela', 'Caracas', '27223228'],
 ['British Virgin Islands', 'Road Town', '21730'],
 ['U.S. Virgin Islands', 'Charlotte Amalie', '108708'],
 ['Vietnam', 'Hanoi', '89571130'],
 ['Vanuatu', 'Port Vila', '221552'],
 ['Wallis and Futuna', 'Mata-Utu', '16025'],
 ['Samoa', 'Apia', '192001'],
 ['Kosovo', 'Pristina', '1800000'],
 ['Yemen', 'Sanaa', '23495361'],
 ['Mayotte', 'Mamoudzou', '159042'],
 ['South Africa', 'Pretoria', '49000000'],
 ['Zambia', 'Lusaka', '13460305'],
 ['Zimbabwe', 'Harare', '11651858']]

CodePudding user response:

You're redefining the variable data on every loop. You need to define a variable before the loop to store all the data:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')

data = []

for country in countrys:
    country_name = country.find('h3',class_='country-name').text.strip()
    capital = country.find('span',class_='country-capital').text
    population = country.find('span',class_='country-population').text

    data.append([country_name, capital, population])

print(data)

Or better yet, you can use dictionaries, which will make accessing the data easier:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')

data = {}

for country in countrys:
    country_name = country.find('h3',class_='country-name').text.strip()
    capital = country.find('span',class_='country-capital').text
    population = country.find('span',class_='country-population').text

    data[country_name] = {'capital': capital, 'population': population}

print(data)
  • Related