How do I write each <a>
line into it own file and use H2 as filename???
import re
import requests
from bs4 import BeautifulSoup
import os
data = '<html><div > <a href="/green"> <div > GRN <h2 > Green </h2> </div> </a> <a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a> <a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
colors = soup.find("div", {"class": "colors"})
for lines in colors:
docs = lines.find("h2").text.strip()
file = open('C:/Users/Admin/Desktop/' str(doc) '.txt', 'a', encoding='utf-8')
file.write(str(lines))
file.close()
Looking for results with the filename and html contents inside.
Green.txt
<a href="/green"> <div > GRN <h2 > Green </h2> </div> </a>
Purple.txt
<a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a>
Orange.txt
<a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a>
CodePudding user response:
Hope I got it right you have to iterate the <a>
instead the <div>
with class colors to get your goal:
for e in soup.select('.colors a'):
name = e.h2.get_text(strip=True)
html = str(e)
file = open(name '.txt', 'a', encoding='utf-8')
file.write(html)
file.close()
Example
from bs4 import BeautifulSoup
import os
data = '<html><div > <a href="/green"> <div > GRN <h2 > Green </h2> </div> </a> <a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a> <a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
for e in soup.select('.colors a'):
name = e.h2.get_text(strip=True)
html = str(e)
file = open(name '.txt', 'a', encoding='utf-8')
file.write(html)
file.close()
CodePudding user response:
You can use find_all
method to extract to get all a
tag and iterate over to get filename from h2
tag and you can get desire output:
links=colors.find_all("a")
for link in links:
fname=link.find("h2").get_text(strip=True)
with open(fname ".txt","w") as wr:
wr.write(str(link))