Home > Mobile >  Create multiple files with filenames from bs4
Create multiple files with filenames from bs4

Time:08-16

How do I write each <a> line into it own file and use H2 as filename???

import re
import requests
from bs4 import BeautifulSoup
import os

data = '<html><div > <a href="/green"> <div > GRN <h2 > Green </h2> </div> </a> <a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a> <a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")

colors = soup.find("div", {"class": "colors"})

for lines in colors:
    docs = lines.find("h2").text.strip()
    file = open('C:/Users/Admin/Desktop/' str(doc) '.txt', 'a', encoding='utf-8')
    file.write(str(lines))
    file.close()

Looking for results with the filename and html contents inside.

Green.txt <a href="/green"> <div > GRN <h2 > Green </h2> </div> </a>

Purple.txt <a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a>

Orange.txt <a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a>

CodePudding user response:

Hope I got it right you have to iterate the <a> instead the <div> with class colors to get your goal:

for e in soup.select('.colors a'):
    name = e.h2.get_text(strip=True)
    html = str(e)
    file = open(name '.txt', 'a', encoding='utf-8')
    file.write(html)
    file.close()

Example

from bs4 import BeautifulSoup
import os

data = '<html><div > <a href="/green"> <div > GRN <h2 > Green </h2> </div> </a> <a href="/purple"> <div > PURP <h2 > Purple </h2> </div> </a> <a href="/orange"> <div > ORNG <h2 > Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
    
for e in soup.select('.colors a'):
    name = e.h2.get_text(strip=True)
    html = str(e)
    file = open(name '.txt', 'a', encoding='utf-8')
    file.write(html)
    file.close()

CodePudding user response:

You can use find_all method to extract to get all a tag and iterate over to get filename from h2 tag and you can get desire output:

links=colors.find_all("a")
for link in links:
    fname=link.find("h2").get_text(strip=True)
     with open(fname ".txt","w") as wr:
        wr.write(str(link))
  • Related