I wrote a code myself and it goes like this
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
URL = 'https://www.books.toscrape/j/detail/164857/963108'
content = requests.get(URL).content
soup = BeautifulSoup(content,"html.parser")
phone = soup.find_all(text=re.compile("phone|phone"))
name = soup.find_all(text=re.compile("name|name"))
mail = soup.find_all(text=re.compile("mail|mail"))
df = pd.DataFrame([phone,name,mail,])
df.to_csv('D:\\products.csv', index=False, encoding='utf-8')
yes it looks weird, i hope these three fandalls can be merged into one, like this
F = soup.find_all(text=re.compile("phone|phone")),soup.find_all(text=re.compile("name|name")),soup.find_all(text=re.compile("mail|mail"))
come and help me
CodePudding user response:
One way to write it shorter is:
data = [soup.find_all(text=re.compile(pat)) for pat in ("phone", "name", "mail")]
df = pd.DataFrame(data)