Home > Software design >  How to get a specific columns from a wiki table
How to get a specific columns from a wiki table

Time:03-16

Basically I have the table on this page: https://en.wikipedia.org/wiki/List_of_cakes and I want to grab the text from the first, third and forth columns and format them to look as such:

Amandine - Romania - Chocolate layered cake filled with chocolate, caramel and fondant cream

So far I have this bit of code which I modified from this post:How do I extract text data in first column from Wikipedia table?.

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_cakes"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
    data = items.get_text(strip=True)
    print(data)

Which outputs

AmandineRomaniaChocolate layered cake filled with chocolate, caramel and fondant cream
AmygdalopitaGreeceAlmond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cakeUnited Kingdom[1]Sponge cake,cream,food colouring
Angel food cakeUnited StatesEgg whites, vanilla, andcream of tartar
etc...

I am just trying to scrape this wiki page and have a text file of these so if someone on my twitch uses the command !cake it will pick one at random.

CodePudding user response:

You are near to your goal, just find_all('td') in your row and pick by index from ResulSet:

for items in soup.find(class_="wikitable").find_all("tr")[1:]:
    e = items.find_all('td')
    data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
    print(data)

or use list comprehension:

for items in soup.find(class_="wikitable").find_all("tr")[1:]:
    print(' - '.join([items.find_all('td')[i].get_text(strip=True) for i in [0,2,3]]))

Example

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_cakes"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
    e = items.find_all('td')
    data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
    print(data)

Output

Amandine - Romania - Chocolate layered cake filled with chocolate, caramel and fondant cream
Amygdalopita - Greece - Almond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cake - United Kingdom[1] - Sponge cake, cream, food colouring
Angel food cake - United States - Egg whites, vanilla, and cream of tartar
Apple cake - Germany - Apple, caramel icing
Applesauce cake - Early colonial times in the New England Colonies of the Northeastern United States[2] - Prepared using apple sauce, flour and sugar as primary ingredients
Aranygaluska - Hungary - A cake with yeasty dough and vanilla custard
  • Related