*below is my html code and I hope I can only get the hd data inside of this table I hope I can get necessary data and I would like to remove the data including ">" line. I have no idea how to extract the html code line having "...
And hope I can a
[['1501','9,445', '50', ' 0.53%', '0', '1','1,000', '94', 'N/A', 'N/A', 'N/A'],
['1502','18,875', '195', '-0.12%', '0', '7','500', '94', 'N/A', 'N/A', 'N/A'],
...............................................,
['1550','8,350', '95', ' 1.15%', '0', '2,601','1,000', '84', 'N/A', 'N/A', 'N/A']]
my python code is like below:
stock_list = soup.find("table", attrs={"class": "type_2"}).find("tbody").find_all("tr")
for stock in stock_list:
if len(stock) > 1:
stock.get_text().split()
but I only get like:
[['1501','메리츠', '인버스','2X', '국채10년ETN' ,'9,445', '50', ' 0.53%', '0', '1','1,000', '94', 'N/A', 'N/A', 'N/A'],
['1502','KB', '레버리지','구리', '선물ETN(H)' ,'18,875', '195', '-0.12%', '0', '7','500', '94', 'N/A', 'N/A', 'N/A'],
...............................................,
['1550','TRUE', '인버스','2X', 'HSCEI','ETN(H)' ,'8,350', '95', ' 1.15%', '0', '2,601','1,000', '84', 'N/A', 'N/A', 'N/A']]
html code is like below:
<table summary="코스피 시세정보를 선택한 항목에 따라 정보를 제공합니다." cellpadding="0" cellspacing="0" >
<caption>코스피</caption>
<colgroup>
<col width="2%">
<col width="*">
<col width="7%">
<col width="9%">
<col width="7%">
<col width="8%">
<col width="8%">
<col width="8%">
<col width="8%">
<col width="8%">
<col width="8%">
<col width="8%">
<col width="6%">
</colgroup>
<thead>
<tr>
<th scope="col">N</th>
<th scope="col">종목명</th>
<th scope="col">현재가</th>
<th scope="col" style="padding-right:8px">전일비</th>
<th scope="col">등락률</th>
<th scope="col">액면가</th>
<th scope="col">거래량</th>
<th scope="col">상장주식수</th>
<th scope="col">시가총액</th>
<th scope="col">PER</th>
<th scope="col">ROE</th>
<th scope="col">PBR</th>
<th scope="col">토론실</th>
</tr>
</thead>
<tbody>
<tr><td colspan="10" ></td></tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)" style="background-color: rgb(255, 255, 255);">
<td >1501</td>
<td><a href="/item/main.naver?code=610021" >메리츠 인버스 2X 국채10년 ETN</a></td>
<td >9,445</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
50
</span>
</td>
<td >
<span >
0.53%
</span>
</td>
<td >0</td>
<td >1</td>
<td >1,000</td>
<td >94</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=610021"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)" style="background-color: rgb(255, 255, 255);">
<td >1502</td>
<td><a href="/item/main.naver?code=580032" >KB 레버리지 구리 선물 ETN(H)</a></td>
<td >18,875</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
195
</span>
</td>
<td >
<span >
-1.02%
</span>
</td>
<td >0</td>
<td >7</td>
<td >500</td>
<td >94</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=580032"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)" style="background-color: rgb(255, 255, 255);">
<td >1503</td>
<td><a href="/item/main.naver?code=570064" >TRUE 인버스 베트남 VN30 선물 ETN(H)</a></td>
<td >9,415</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
55
</span>
</td>
<td >
<span >
-0.58%
</span>
</td>
<td >0</td>
<td >260</td>
<td >1,000</td>
<td >94</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=570064"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)" style="background-color: rgb(255, 255, 255);">
<td >1504</td>
<td><a href="/item/main.naver?code=256450" >ARIRANG 심천차이넥스트(합성)</a></td>
<td >15,680</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
5
</span>
</td>
<td >
<span >
0.03%
</span>
</td>
<td >0</td>
<td >538</td>
<td >600</td>
<td >94</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=256450"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)" style="background-color: rgb(255, 255, 255);">
<td >1505</td>
<td><a href="/item/main.naver?code=380340" >KINDEX Fn5G플러스</a></td>
<td >9,405</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
100
</span>
</td>
<td >
<span >
1.07%
</span>
</td>
<td >0</td>
<td >6,537</td>
<td >1,000</td>
<td >94</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=380340"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1506</td>
<td><a href="/item/main.naver?code=530087" >삼성 KRX 2차전지 K-뉴딜 ETN</a></td>
<td >9,335</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
60
</span>
</td>
<td >
<span >
-0.64%
</span>
</td>
<td >0</td>
<td >15</td>
<td >1,000</td>
<td >93</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=530087"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1507</td>
<td><a href="/item/main.naver?code=152500" >KINDEX 레버리지</a></td>
<td >9,320</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
5
</span>
</td>
<td >
<span >
-0.05%
</span>
</td>
<td >0</td>
<td >4,547</td>
<td >1,000</td>
<td >93</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=152500"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1508</td>
<td><a href="/item/main.naver?code=407300" >HANARO Fn골프테마</a></td>
<td >9,270</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
40
</span>
</td>
<td >
<span >
-0.43%
</span>
</td>
<td >0</td>
<td >7,021</td>
<td >1,000</td>
<td >93</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=407300"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1509</td>
<td><a href="/item/main.naver?code=500012" >신한 인버스 달러인덱스 선물 ETN(H)</a></td>
<td >9,225</td>
<td >
<span >0</span>
</td>
<td >
<span >0.00%</span>
</td>
<td >0</td>
<td >0</td>
<td >1,000</td>
<td >92</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=500012"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1510</td>
<td><a href="/item/main.naver?code=227830" >ARIRANG 코스피</a></td>
<td >30,720</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
15
</span>
</td>
<td >
<span >
-0.05%
</span>
</td>
<td >0</td>
<td >44</td>
<td >300</td>
<td >92</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=227830"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1511</td>
<td><a href="/item/main.naver?code=364690" >KODEX 혁신기술테마액티브</a></td>
<td >13,060</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
30
</span>
</td>
<td >
<span >
0.23%
</span>
</td>
<td >0</td>
<td >977</td>
<td >700</td>
<td >91</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=364690"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1512</td>
<td><a href="/item/main.naver?code=189400" >ARIRANG 글로벌MSCI(합성 H)</a></td>
<td >17,870</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
115
</span>
</td>
<td >
<span >
-0.64%
</span>
</td>
<td >0</td>
<td >272</td>
<td >510</td>
<td >91</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=189400"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1513</td>
<td><a href="/item/main.naver?code=272230" >KINDEX 스마트밸류</a></td>
<td >15,055</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
20
</span>
</td>
<td >
<span >
-0.13%
</span>
</td>
<td >0</td>
<td >20</td>
<td >600</td>
<td >90</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=272230"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1514</td>
<td><a href="/item/main.naver?code=570023" >TRUE 인버스 2X S&P500 선물 ETN(H)</a></td>
<td >1,800</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
20
</span>
</td>
<td >
<span >
1.12%
</span>
</td>
<td >0</td>
<td >27,146</td>
<td >5,000</td>
<td >90</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=570023"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1515</td>
<td><a href="/item/main.naver?code=167860" >KOSEF 국고채10년레버리지</a></td>
<td >127,925</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
200
</span>
</td>
<td >
<span >
-0.16%
</span>
</td>
<td >0</td>
<td >751</td>
<td >70</td>
<td >90</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=167860"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1516</td>
<td><a href="/item/main.naver?code=610008" >메리츠 레버리지 국채30년 ETN</a></td>
<td >8,915</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
165
</span>
</td>
<td >
<span >
-1.82%
</span>
</td>
<td >0</td>
<td >1,098</td>
<td >1,000</td>
<td >89</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=610008"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1517</td>
<td><a href="/item/main.naver?code=005965" >동부건설우</a></td>
<td >39,400</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
2,300
</span>
</td>
<td >
<span >
6.20%
</span>
</td>
<td >5,000</td>
<td >5,382</td>
<td >226</td>
<td >89</td>
<td >8.55</td>
<td >N/A</td>
<td >1.67</td>
<td ><a href="/item/board.naver?code=005965"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1518</td>
<td><a href="/item/main.naver?code=700003" >하나 KRX BBIG K-뉴딜 ETN</a></td>
<td >8,890</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_down.gif" width="7" height="6" style="margin-right:4px;" alt="하락"><span >
65
</span>
</td>
<td >
<span >
-0.73%
</span>
</td>
<td >0</td>
<td >5</td>
<td >1,000</td>
<td >89</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=700003"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1519</td>
<td><a href="/item/main.naver?code=014825" >동원시스템즈우</a></td>
<td >33,500</td>
<td >
<span >0</span>
</td>
<td >
<span >0.00%</span>
</td>
<td >5,000</td>
<td >73</td>
<td >265</td>
<td >89</td>
<td >15.17</td>
<td >N/A</td>
<td >1.62</td>
<td ><a href="/item/board.naver?code=014825"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1520</td>
<td><a href="/item/main.naver?code=307510" >TIGER 의료기기</a></td>
<td >19,665</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
310
</span>
</td>
<td >
<span >
1.60%
</span>
</td>
<td >0</td>
<td >7,758</td>
<td >450</td>
<td >88</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=307510"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
<tr onm ouseover="mouseOver(this)" onm ouseout="mouseOut(this)">
<td >1550</td>
<td><a href="/item/main.naver?code=570032" >TRUE 인버스 2X HSCEI ETN(H)</a></td>
<td >8,350</td>
<td >
<img src="https://ssl.pstatic.net/imgstock/images/images4/ico_up.gif" width="7" height="6" style="margin-right:4px;" alt="상승"><span >
95
</span>
</td>
<td >
<span >
1.15%
</span>
</td>
<td >0</td>
<td >2,601</td>
<td >1,000</td>
<td >84</td>
<td >N/A</td>
<td >N/A</td>
<td >N/A</td>
<td ><a href="/item/board.naver?code=570032"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a></td>
</tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
<tr><td colspan="13" ></td></tr>
</tbody>
</table>
CodePudding user response:
Note Question needs some improvement in addition an url would be great, to get more context and come up with more specific solutions - Example is based on actually provided content
How to fix?
Select the need elements more specific css selectors
could be used to - Following line will select all rows from table with captions that contains "코스피" and not have any th or a colspan:
soup.select('table:has(caption:-soup-contains("코스피")) tr:not(:has(th, [colspan]))')
This will create the resultset data
with texts from row:
data = []
for row in soup.select('table:has(caption:-soup-contains("코스피")) tr:not(:has(th, [colspan]))'):
data.append([x.text for x in row.select('td:not(.title)')])
EDIT
Based on additional context of url (finance.naver.com/sise/sise_market_sum.nhn?page=31) css selectors
changes like that.
To get the data (table is the only with classname type_2):
for row in soup.select('table.type_2 tr:not(:has(th, [colspan]))'):
data.append([x.text for x in row.select('td:not(.title)')])
To get the headings:
list(soup.select_one('table.type_2 tr').stripped_strings)
Example ( create dataframe from data)
import requests
import pandas as pd
from bs4 import BeautifulSoup
html = requests.get('https://finance.naver.com/sise/sise_market_sum.nhn?page=31').text
soup = BeautifulSoup(html, 'lxml')
data = []
for row in soup.select('table.type_2 tr:not(:has(th, [colspan]))'):
data.append([x.text for x in row.select('td:not(.title)')])
pd.DataFrame(data, columns=list(soup.select_one('table.type_2 tr').stripped_strings))
Output
N | 종목명 | 현재가 | 전일비 | 등락률 | 액면가 | 거래량 | 상장주식수 | 시가총액 | PER | ROE | PBR | 토론실 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1501 | 메리츠 인버스 2X 국채10년 ETN | 9,445 | 50 | 0.53% | 0 | 1 | 1,000 | 94 | N/A | N/A | N/A | |
1502 | KB 레버리지 구리 선물 ETN(H) | 18,875 | 195 | -1.02% | 0 | 7 | 500 | 94 | N/A | N/A | N/A | |
1503 | TRUE 인버스 베트남 VN30 선물 ETN(H) | 9,415 | 55 | -0.58% | 0 | 260 | 1,000 | 94 | N/A | N/A | N/A |