I have a html website without any tables and I want to scrap data in form of a table. Here is the sample html code
<div class='ah-content'
<h4>XYZ Community</h4>
<p>123 Street</p>
<p>Atlanta, Georgia, 12345</p>
<p>1234567890</p>
</div>
It is a long list like this and I want to capture <h4>
and <p>
between <div>
So, the output will be:
Name | Address | Address2 | Phone |
---|---|---|---|
xyz Community | 123 Street | Atlanta, Georgia, 12345 | 1234567890 |
CodePudding user response:
If all <div class='ah-content'>
follows the same pattern like in your example you can use this script to create a DataFrame:
import pandas as pd
from bs4 import BeautifulSoup
html_doc = """\
<div class='ah-content'>
<h4>XYZ Community</h4>
<p>123 Street</p>
<p>Atlanta, Georgia, 12345</p>
<p>1234567890</p>
</div>"""
soup = BeautifulSoup(html_doc, "html.parser")
strings = [[t.text for t in c.find_all()] for c in soup.select(".ah-content")]
df = pd.DataFrame(strings, columns=["Name", "Address", "Address2", "Phone"])
print(df.to_markdown(index=False))
Prints:
Name | Address | Address2 | Phone |
---|---|---|---|
XYZ Community | 123 Street | Atlanta, Georgia, 12345 | 1234567890 |