I have a target website with this code:
<div data-v-afa58544="" >
0.51
</div>
I'm trying to access that '0.51', but I have no idea of how to reference the data-v-afa58544
of the div. What could be the way to access that value with Selenium (or BeautifulSoup)? I would appreciate any help.
Edit:
Can't access the element with the following code:
v = soup.find('div[data-v-afa58544]')
p = v.select_one('.pa-0 col col-4')
even trying v = soup.select_one
or v = soup.find_all
and iterating it always returns null value.
CodePudding user response:
The text node value of div data-v-afa58544'
is 0.51
solution (using bs4):
v = soup.select_one(".pa-0.col.col-4").get_text(strip=True)
v = soup.select_one('div[data-v-afa58544]').get_text(strip=True)
Example:
html = '''
<div data-v-afa58544="" >
0.51
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
#print(soup.prettify())
v = soup.select_one('.pa-0')
p= v.get_text(strip=True) if v else None
print(p)
Output:
0.51
Update:
import requests
import pandas as pd
headers = {
'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
url = 'https://www.river.go.jp/kawabou/file/files/tmlist/rn/20221004/0845/1614200100163.json'
r = requests.get(url,headers=headers)
df = pd.json_normalize(r.json()['min10Values'])
print(df)
Output:
obsTime rn10m rn10mCcd rnInc rnIncCcd
0 2022/10/04 08:40 1 0 6 0
1 2022/10/04 08:30 0 0 5 0
2 2022/10/04 08:20 1 0 5 0
3 2022/10/04 08:10 0 0 4 0
4 2022/10/04 08:00 1 0 4 0
5 2022/10/04 07:50 0 0 3 0
6 2022/10/04 07:40 1 0 3 0
7 2022/10/04 07:30 0 0 2 0
8 2022/10/04 07:20 0 0 2 0
9 2022/10/04 07:10 0 0 2 0
10 2022/10/04 07:00 0 0 2 0
11 2022/10/04 06:50 0 0 2 0
12 2022/10/04 06:40 1 0 2 0
13 2022/10/04 06:30 0 0 1 0
14 2022/10/04 06:20 0 0 1 0
15 2022/10/04 06:10 0 0 1 0
16 2022/10/04 06:00 0 0 1 0
17 2022/10/04 05:50 0 0 1 0
18 2022/10/04 05:40 1 0 1 0
19 2022/10/04 05:30 0 0 0 0
20 2022/10/04 05:20 0 0 0 0
21 2022/10/04 05:10 0 0 0 0
22 2022/10/04 05:00 0 0 0 0
23 2022/10/04 04:50 0 0 0 0
24 2022/10/04 04:40 0 0 0 0
25 2022/10/04 04:30 0 0 0 0
26 2022/10/04 04:20 0 0 0 0
27 2022/10/04 04:10 0 0 0 0
28 2022/10/04 04:00 0 0 0 0
29 2022/10/04 03:50 0 0 0 0
30 2022/10/04 03:40 0 0 0 0
31 2022/10/04 03:30 0 0 0 0
32 2022/10/04 03:20 0 0 0 0
33 2022/10/04 03:10 0 0 0 0
34 2022/10/04 03:00 0 0 0 0
35 2022/10/04 02:50 0 0 0 0
36 2022/10/04 02:40 0 0 0 0
37 2022/10/04 02:30 0 0 0 0
38 2022/10/04 02:20 0 0 0 0
39 2022/10/04 02:10 0 0 0 0
40 2022/10/04 02:00 0 0 0 0
41 2022/10/04 01:50 0 0 0 0
42 2022/10/04 01:40 0 0 0 0
43 2022/10/04 01:30 0 0 0 0
44 2022/10/04 01:20 0 0 0 0
45 2022/10/04 01:10 0 0 0 0
46 2022/10/04 01:00 0 0 0 0
47 2022/10/04 00:50 0 0 0 0
48 2022/10/04 00:40 0 0 0 0
49 2022/10/04 00:30 0 0 0 0
Selenium and bs4:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url ='https://www.river.go.jp/kawabou/pcfull/tm?itmkndCd=4&ofcCd=20757&obsCd=62&isCurrent=true&fld=0'
driver.get(url)
driver.maximize_window()
time.sleep(8)
soup = BeautifulSoup(driver.page_source,"html.parser")
e=[]
for row in soup.select('div[] > span'):
v= row.get_text(strip=True).split(':')
d={
v[0]:v[1].replace('marrow_upward','')
}
e.append(d)
print(e)
Output:
[{'水位': '0.58'}, {'時間雨量': '3.0mm'}, {'10分雨量': '1.0mm'}, {'降り始めからの雨量': '8.0mm'}]
and
e=[]
row = soup.select('div[] > span')[0]
v= row.get_text(strip=True).split(':')
d={
v[0]:v[1].replace('marrow_upward','')
}
e.append(d)
print(e)
Output:
[{'水位': '0.59'}]