Home > Blockchain >  bs4.element.ResultSet elements to a list
bs4.element.ResultSet elements to a list

Time:10-03

I want to pass categories of website to a list but when I use for-loop to append the elements of categories which type is bs4.element.ResultSet to a list it's not append as a word but some characters appended to the list with words like this:

['\n\n\n                            \n                                Books\n                            \n                        \n\n\n\n                            \n                                Travel\n      
                   \n                        \n\n\n\n            \n                                Mystery\n  ]  

I want to append only words to the list to use it to sqlite table here is the code:

from sqlite3.dbapi2 import connect
from typing import List
import requests
from bs4 import BeautifulSoup
import sqlite3
import lxml

response = requests.get('https://books.toscrape.com/')
# all html&css content-
soup = BeautifulSoup(response.text, 'lxml')
categories = soup.findAll("ul", class_ = 'nav nav-list' )
list = []

for i in categories:
    list.append(i.text)
print(list)

CodePudding user response:

Solution

You can use the parameter strip in get_text():

import requests
from bs4 import BeautifulSoup
url = 'https://books.toscrape.com/'
response = requests.get(url)
# all html&css content-
soup = BeautifulSoup(response.text, 'lxml')
categories = soup.select('ul.nav.nav-list li a' )
list = []

for i in categories:
    list.append(i.get_text(strip=True))
print(list)

Output

['Books', 'Travel', 'Mystery', 'Historical Fiction', 'Sequential Art', 'Classics', 'Philosophy', 'Romance', 'Womens Fiction', 'Fiction', 'Childrens', 'Religion', 'Nonfiction', 'Music', 'Default', 'Science Fiction', 'Sports and Games', 'Add a comment', 'Fantasy', 'New Adult', 'Young Adult', 'Science', 'Poetry', 'Paranormal', 'Art', 'Psychology', 'Autobiography', 'Parenting', 'Adult Fiction', 'Humor', 'Horror', 'History', 'Food and Drink', 'Christian Fiction', 'Business', 'Biography', 'Thriller', 'Contemporary', 'Spirituality', 'Academic', 'Self Help', 'Historical', 'Christian', 'Suspense', 'Short Stories', 'Novels', 'Health', 'Politics', 'Cultural', 'Erotica', 'Crime']

You may also wanna to take a look at your selector - This one is more specific:

soup.select('ul.nav.nav-list li a')

CodePudding user response:

It has to do with targeting the specific parts of your html. Would something like this work?

response = requests.get('https://books.toscrape.com/')
# all html&css content-
soup = BeautifulSoup(response.content, 'html')
categories = soup.find("ul", class_ = 'nav nav-list' ).find('li').find('ul').find_all('a')

list = []

for i in categories:
    if i:
        list.append(i.text.strip())
print(list)

CodePudding user response:

You need to strip the unnecessary character. Here I use split which splits a string on whitespaces so essentially you only get back words:

from sqlite3.dbapi2 import connect
from typing import List
import requests
from bs4 import BeautifulSoup
import sqlite3
import lxml

response = requests.get('https://books.toscrape.com/')
# all html&css content-
soup = BeautifulSoup(response.text, 'lxml')
categories = soup.findAll("ul", class_ = 'nav nav-list' )
list = []

for i in categories:
    list =i.text.split()
print(list)

Also using append here gives you a list within a list. You want to add the lists like I did because you might get lists of word as a result from each response.

  • Related