It is probably a newbie problem but I cannot solve it. Found couple of different web scraping codes on youtube tutorials, but each one of them gives me only the last data point, and not a list of all of them as I want to get. This is my code(using jupyter notebook):
import requests
html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')
for country in countrys:
country_name = country.find('h3',class_='country-name').text.strip()
capital = country.find('span',class_='country-capital').text
population = country.find('span',class_='country-population').text
data = [country_name, capital, population]
print(data)
Result:
['Zimbabwe', 'Harare', '11651858']
Therefore, only last value of the data(country list) is a result of a code. How can I get the list of all the data?
CodePudding user response:
You have to create data
variable as a list outside the loop and append records to the list:
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')
data = [] # <- HERE
for country in countrys:
country_name = country.find('h3',class_='country-name').text.strip()
capital = country.find('span',class_='country-capital').text
population = country.find('span',class_='country-population').text
data.append([country_name, capital, population]) # <- HERE
print(data)
Output:
[['Andorra', 'Andorra la Vella', '84000'],
['United Arab Emirates', 'Abu Dhabi', '4975593'],
['Afghanistan', 'Kabul', '29121286'],
['Antigua and Barbuda', "St. John's", '86754'],
['Anguilla', 'The Valley', '13254'],
['Albania', 'Tirana', '2986952'],
['Armenia', 'Yerevan', '2968000'],
['Angola', 'Luanda', '13068161'],
['Antarctica', 'None', '0'],
['Argentina', 'Buenos Aires', '41343201'],
['American Samoa', 'Pago Pago', '57881'],
['Austria', 'Vienna', '8205000'],
['Australia', 'Canberra', '21515754'],
['Aruba', 'Oranjestad', '71566'],
['Åland', 'Mariehamn', '26711'],
['Azerbaijan', 'Baku', '8303512'],
['Bosnia and Herzegovina', 'Sarajevo', '4590000'],
['Barbados', 'Bridgetown', '285653'],
['Bangladesh', 'Dhaka', '156118464'],
['Belgium', 'Brussels', '10403000'],
['Burkina Faso', 'Ouagadougou', '16241811'],
['Bulgaria', 'Sofia', '7148785'],
['Bahrain', 'Manama', '738004'],
['Burundi', 'Bujumbura', '9863117'],
['Benin', 'Porto-Novo', '9056010'],
['Saint Barthélemy', 'Gustavia', '8450'],
['Bermuda', 'Hamilton', '65365'],
['Brunei', 'Bandar Seri Begawan', '395027'],
['Bolivia', 'Sucre', '9947418'],
['Bonaire', 'Kralendijk', '18012'],
['Brazil', 'Brasília', '201103330'],
['Bahamas', 'Nassau', '301790'],
['Bhutan', 'Thimphu', '699847'],
['Bouvet Island', 'None', '0'],
['Botswana', 'Gaborone', '2029307'],
['Belarus', 'Minsk', '9685000'],
['Belize', 'Belmopan', '314522'],
['Canada', 'Ottawa', '33679000'],
['Cocos [Keeling] Islands', 'West Island', '628'],
['Democratic Republic of the Congo', 'Kinshasa', '70916439'],
['Central African Republic', 'Bangui', '4844927'],
['Republic of the Congo', 'Brazzaville', '3039126'],
['Switzerland', 'Bern', '7581000'],
['Ivory Coast', 'Yamoussoukro', '21058798'],
['Cook Islands', 'Avarua', '21388'],
['Chile', 'Santiago', '16746491'],
['Cameroon', 'Yaoundé', '19294149'],
['China', 'Beijing', '1330044000'],
['Colombia', 'Bogotá', '47790000'],
['Costa Rica', 'San José', '4516220'],
['Cuba', 'Havana', '11423000'],
['Cape Verde', 'Praia', '508659'],
['Curacao', 'Willemstad', '141766'],
['Christmas Island', 'Flying Fish Cove', '1500'],
['Cyprus', 'Nicosia', '1102677'],
['Czech Republic', 'Prague', '10476000'],
['Germany', 'Berlin', '81802257'],
['Djibouti', 'Djibouti', '740528'],
['Denmark', 'Copenhagen', '5484000'],
['Dominica', 'Roseau', '72813'],
['Dominican Republic', 'Santo Domingo', '9823821'],
['Algeria', 'Algiers', '34586184'],
['Ecuador', 'Quito', '14790608'],
['Estonia', 'Tallinn', '1291170'],
['Egypt', 'Cairo', '80471869'],
['Western Sahara', 'Laâyoune / El Aaiún', '273008'],
['Eritrea', 'Asmara', '5792984'],
['Spain', 'Madrid', '46505963'],
['Ethiopia', 'Addis Ababa', '88013491'],
['Finland', 'Helsinki', '5244000'],
['Fiji', 'Suva', '875983'],
['Falkland Islands', 'Stanley', '2638'],
['Micronesia', 'Palikir', '107708'],
['Faroe Islands', 'Tórshavn', '48228'],
['France', 'Paris', '64768389'],
['Gabon', 'Libreville', '1545255'],
['United Kingdom', 'London', '62348447'],
['Grenada', "St. George's", '107818'],
['Georgia', 'Tbilisi', '4630000'],
['French Guiana', 'Cayenne', '195506'],
['Guernsey', 'St Peter Port', '65228'],
['Ghana', 'Accra', '24339838'],
['Gibraltar', 'Gibraltar', '27884'],
['Greenland', 'Nuuk', '56375'],
['Gambia', 'Bathurst', '1593256'],
['Guinea', 'Conakry', '10324025'],
['Guadeloupe', 'Basse-Terre', '443000'],
['Equatorial Guinea', 'Malabo', '1014999'],
['Greece', 'Athens', '11000000'],
['South Georgia and the South Sandwich Islands', 'Grytviken', '30'],
['Guatemala', 'Guatemala City', '13550440'],
['Guam', 'Hagåtña', '159358'],
['Guinea-Bissau', 'Bissau', '1565126'],
['Guyana', 'Georgetown', '748486'],
['Hong Kong', 'Hong Kong', '6898686'],
['Heard Island and McDonald Islands', 'None', '0'],
['Honduras', 'Tegucigalpa', '7989415'],
['Croatia', 'Zagreb', '4491000'],
['Haiti', 'Port-au-Prince', '9648924'],
['Hungary', 'Budapest', '9982000'],
['Indonesia', 'Jakarta', '242968342'],
['Ireland', 'Dublin', '4622917'],
['Israel', 'None', '7353985'],
['Isle of Man', 'Douglas', '75049'],
['India', 'New Delhi', '1173108018'],
['British Indian Ocean Territory', 'None', '4000'],
['Iraq', 'Baghdad', '29671605'],
['Iran', 'Tehran', '76923300'],
['Iceland', 'Reykjavik', '308910'],
['Italy', 'Rome', '60340328'],
['Jersey', 'Saint Helier', '90812'],
['Jamaica', 'Kingston', '2847232'],
['Jordan', 'Amman', '6407085'],
['Japan', 'Tokyo', '127288000'],
['Kenya', 'Nairobi', '40046566'],
['Kyrgyzstan', 'Bishkek', '5776500'],
['Cambodia', 'Phnom Penh', '14453680'],
['Kiribati', 'Tarawa', '92533'],
['Comoros', 'Moroni', '773407'],
['Saint Kitts and Nevis', 'Basseterre', '51134'],
['North Korea', 'Pyongyang', '22912177'],
['South Korea', 'Seoul', '48422644'],
['Kuwait', 'Kuwait City', '2789132'],
['Cayman Islands', 'George Town', '44270'],
['Kazakhstan', 'Astana', '15340000'],
['Laos', 'Vientiane', '6368162'],
['Lebanon', 'Beirut', '4125247'],
['Saint Lucia', 'Castries', '160922'],
['Liechtenstein', 'Vaduz', '35000'],
['Sri Lanka', 'Colombo', '21513990'],
['Liberia', 'Monrovia', '3685076'],
['Lesotho', 'Maseru', '1919552'],
['Lithuania', 'Vilnius', '2944459'],
['Luxembourg', 'Luxembourg', '497538'],
['Latvia', 'Riga', '2217969'],
['Libya', 'Tripoli', '6461454'],
['Morocco', 'Rabat', '31627428'],
['Monaco', 'Monaco', '32965'],
['Moldova', 'Chişinău', '4324000'],
['Montenegro', 'Podgorica', '666730'],
['Saint Martin', 'Marigot', '35925'],
['Madagascar', 'Antananarivo', '21281844'],
['Marshall Islands', 'Majuro', '65859'],
['Macedonia', 'Skopje', '2062294'],
['Mali', 'Bamako', '13796354'],
['Myanmar [Burma]', 'Naypyitaw', '53414374'],
['Mongolia', 'Ulan Bator', '3086918'],
['Macao', 'Macao', '449198'],
['Northern Mariana Islands', 'Saipan', '53883'],
['Martinique', 'Fort-de-France', '432900'],
['Mauritania', 'Nouakchott', '3205060'],
['Montserrat', 'Plymouth', '9341'],
['Malta', 'Valletta', '403000'],
['Mauritius', 'Port Louis', '1294104'],
['Maldives', 'Malé', '395650'],
['Malawi', 'Lilongwe', '15447500'],
['Mexico', 'Mexico City', '112468855'],
['Malaysia', 'Kuala Lumpur', '28274729'],
['Mozambique', 'Maputo', '22061451'],
['Namibia', 'Windhoek', '2128471'],
['New Caledonia', 'Noumea', '216494'],
['Niger', 'Niamey', '15878271'],
['Norfolk Island', 'Kingston', '1828'],
['Nigeria', 'Abuja', '154000000'],
['Nicaragua', 'Managua', '5995928'],
['Netherlands', 'Amsterdam', '16645000'],
['Norway', 'Oslo', '5009150'],
['Nepal', 'Kathmandu', '28951852'],
['Nauru', 'Yaren', '10065'],
['Niue', 'Alofi', '2166'],
['New Zealand', 'Wellington', '4252277'],
['Oman', 'Muscat', '2967717'],
['Panama', 'Panama City', '3410676'],
['Peru', 'Lima', '29907003'],
['French Polynesia', 'Papeete', '270485'],
['Papua New Guinea', 'Port Moresby', '6064515'],
['Philippines', 'Manila', '99900177'],
['Pakistan', 'Islamabad', '184404791'],
['Poland', 'Warsaw', '38500000'],
['Saint Pierre and Miquelon', 'Saint-Pierre', '7012'],
['Pitcairn Islands', 'Adamstown', '46'],
['Puerto Rico', 'San Juan', '3916632'],
['Palestine', 'None', '3800000'],
['Portugal', 'Lisbon', '10676000'],
['Palau', 'Melekeok', '19907'],
['Paraguay', 'Asunción', '6375830'],
['Qatar', 'Doha', '840926'],
['Réunion', 'Saint-Denis', '776948'],
['Romania', 'Bucharest', '21959278'],
['Serbia', 'Belgrade', '7344847'],
['Russia', 'Moscow', '140702000'],
['Rwanda', 'Kigali', '11055976'],
['Saudi Arabia', 'Riyadh', '25731776'],
['Solomon Islands', 'Honiara', '559198'],
['Seychelles', 'Victoria', '88340'],
['Sudan', 'Khartoum', '35000000'],
['Sweden', 'Stockholm', '9828655'],
['Singapore', 'Singapore', '4701069'],
['Saint Helena', 'Jamestown', '7460'],
['Slovenia', 'Ljubljana', '2007000'],
['Svalbard and Jan Mayen', 'Longyearbyen', '2550'],
['Slovakia', 'Bratislava', '5455000'],
['Sierra Leone', 'Freetown', '5245695'],
['San Marino', 'San Marino', '31477'],
['Senegal', 'Dakar', '12323252'],
['Somalia', 'Mogadishu', '10112453'],
['Suriname', 'Paramaribo', '492829'],
['South Sudan', 'Juba', '8260490'],
['São Tomé and Príncipe', 'São Tomé', '175808'],
['El Salvador', 'San Salvador', '6052064'],
['Sint Maarten', 'Philipsburg', '37429'],
['Syria', 'Damascus', '22198110'],
['Swaziland', 'Mbabane', '1354051'],
['Turks and Caicos Islands', 'Cockburn Town', '20556'],
['Chad', "N'Djamena", '10543464'],
['French Southern Territories', 'Port-aux-Français', '140'],
['Togo', 'Lomé', '6587239'],
['Thailand', 'Bangkok', '67089500'],
['Tajikistan', 'Dushanbe', '7487489'],
['Tokelau', 'None', '1466'],
['East Timor', 'Dili', '1154625'],
['Turkmenistan', 'Ashgabat', '4940916'],
['Tunisia', 'Tunis', '10589025'],
['Tonga', "Nuku'alofa", '122580'],
['Turkey', 'Ankara', '77804122'],
['Trinidad and Tobago', 'Port of Spain', '1228691'],
['Tuvalu', 'Funafuti', '10472'],
['Taiwan', 'Taipei', '22894384'],
['Tanzania', 'Dodoma', '41892895'],
['Ukraine', 'Kiev', '45415596'],
['Uganda', 'Kampala', '33398682'],
['U.S. Minor Outlying Islands', 'None', '0'],
['United States', 'Washington', '310232863'],
['Uruguay', 'Montevideo', '3477000'],
['Uzbekistan', 'Tashkent', '27865738'],
['Vatican City', 'Vatican City', '921'],
['Saint Vincent and the Grenadines', 'Kingstown', '104217'],
['Venezuela', 'Caracas', '27223228'],
['British Virgin Islands', 'Road Town', '21730'],
['U.S. Virgin Islands', 'Charlotte Amalie', '108708'],
['Vietnam', 'Hanoi', '89571130'],
['Vanuatu', 'Port Vila', '221552'],
['Wallis and Futuna', 'Mata-Utu', '16025'],
['Samoa', 'Apia', '192001'],
['Kosovo', 'Pristina', '1800000'],
['Yemen', 'Sanaa', '23495361'],
['Mayotte', 'Mamoudzou', '159042'],
['South Africa', 'Pretoria', '49000000'],
['Zambia', 'Lusaka', '13460305'],
['Zimbabwe', 'Harare', '11651858']]
CodePudding user response:
You're redefining the variable data
on every loop. You need to define a variable before the loop to store all the data:
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')
data = []
for country in countrys:
country_name = country.find('h3',class_='country-name').text.strip()
capital = country.find('span',class_='country-capital').text
population = country.find('span',class_='country-population').text
data.append([country_name, capital, population])
print(data)
Or better yet, you can use dictionaries, which will make accessing the data easier:
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.scrapethissite.com/pages/simple/').text
soup = BeautifulSoup(html_text, 'lxml')
countrys= soup.find_all('div',class_='col-md-4 country')
data = {}
for country in countrys:
country_name = country.find('h3',class_='country-name').text.strip()
capital = country.find('span',class_='country-capital').text
population = country.find('span',class_='country-population').text
data[country_name] = {'capital': capital, 'population': population}
print(data)