I had gathered some infos using BeautifulSoup4 in the webpage: https://www.peakbagger.com/list.aspx?lid=5651
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.peakbagger.com/list.aspx?lid=5651'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
row = soup.find('tr')
row
rows = soup.find_all('tr')
for row in rows:
print(row.get_text())
I would want to print the word so that each can be shown in each different sections, e.g.
before:
1.Fuji-sanKanto3776Yamanashi-ken/Shizuoka-kenHonshu3776318
2.Kita-dakeChubu3192Yamanashi-kenHonshu223731
after:
(a= )
Fuji-san
Kita-dake
...
(b=)
Kanto
Chubu
...
(c=)
3776
3192
...
for every of the lines starting from 1. to 100.
Shall I use a for loop or split to break each word?
Thank you.
CodePudding user response:
try these:
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get("https://www.peakbagger.com/list.aspx?lid=5651")
soup = BeautifulSoup(r.content, "lxml")
table = soup.find("table", class_="gray")
header = [th.get_text(strip=True) for th in table.tr.select("th")][1:]
header.insert(0, 'S.No')
all_data = []
for row in table.select("tr:has(td)"):
tds = [td.get_text(strip=True) for td in row.select("td")]
all_data.append(tds)
df = pd.DataFrame(all_data, columns=header)
print(df)
df.to_csv("data.csv", index=False)
output:
S.No Peak Section ... Range (Level 3) Prom-M Ascents
0 1. Fuji-san Kanto ... Honshu 3776 318
1 2. Kita-dake Chubu ... Honshu 2237 31
2 3. Hotaka-dake Chubu ... Honshu 2305 23
3 4. Aino-dake Chubu ... Honshu 299 17
4 5. Yariga-take Chubu ... Honshu 432 29
.. ... ... ... ... ... ... ...
95 96. Meakan-dake Hokkaido ... Hokkaido 1147 7
96 97. Amagi-san Chubu ... Honshu 1004 14
97 98. Ibuki-yama Western Japan ... Honshu 657 8
98 99. Kaimon-dake Western Japan ... Kyushu 867 9
99 100. Tsukuba-san Kanto ... Honshu 797 20