Home > database >  extract word from after scraping with BeautifulSoup
extract word from after scraping with BeautifulSoup

Time:10-13

I had gathered some infos using BeautifulSoup4 in the webpage: https://www.peakbagger.com/list.aspx?lid=5651

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.peakbagger.com/list.aspx?lid=5651'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

row = soup.find('tr') 
row

rows = soup.find_all('tr')
for row in rows:          
    print(row.get_text())

I would want to print the word so that each can be shown in each different sections, e.g.

before:

1.Fuji-sanKanto3776Yamanashi-ken/Shizuoka-kenHonshu3776318

2.Kita-dakeChubu3192Yamanashi-kenHonshu223731

after:

(a= )

Fuji-san

Kita-dake

...

(b=)

Kanto

Chubu

...

(c=)

3776

3192

...

for every of the lines starting from 1. to 100.

Shall I use a for loop or split to break each word?

Thank you.

CodePudding user response:

try these:

import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get("https://www.peakbagger.com/list.aspx?lid=5651")
soup = BeautifulSoup(r.content, "lxml")
table = soup.find("table", class_="gray")
header = [th.get_text(strip=True) for th in table.tr.select("th")][1:]
header.insert(0, 'S.No')

all_data = []
for row in table.select("tr:has(td)"):
    tds = [td.get_text(strip=True) for td in row.select("td")]
    all_data.append(tds)

df = pd.DataFrame(all_data, columns=header)
print(df)
df.to_csv("data.csv", index=False)

output:

 S.No         Peak        Section  ... Range (Level 3) Prom-M Ascents
0     1.     Fuji-san          Kanto  ...          Honshu   3776     318
1     2.    Kita-dake          Chubu  ...          Honshu   2237      31
2     3.  Hotaka-dake          Chubu  ...          Honshu   2305      23
3     4.    Aino-dake          Chubu  ...          Honshu    299      17
4     5.  Yariga-take          Chubu  ...          Honshu    432      29
..   ...          ...            ...  ...             ...    ...     ...
95   96.  Meakan-dake       Hokkaido  ...        Hokkaido   1147       7
96   97.    Amagi-san          Chubu  ...          Honshu   1004      14
97   98.   Ibuki-yama  Western Japan  ...          Honshu    657       8
98   99.  Kaimon-dake  Western Japan  ...          Kyushu    867       9
99  100.  Tsukuba-san          Kanto  ...          Honshu    797      20
  • Related