I want to scrape the main table from
However, I don't get any results neither with .find_all()
nor with .xpath()
import requests
from bs4 import BeautifulSoup
page = requests.get('https://paperswithcode.com/sota/3d-object-detection-on-kitti-cars-moderate')
soup = BeautifulSoup(page.text, 'html.parser')
soup.find_all('table')
# out: []
import requests
import lxml.html as lh
page = requests.get('https://paperswithcode.com/sota/3d-object-detection-on-kitti-cars-moderate')
doc = lh.fromstring(page.content)
doc.xpath('//*[@id="leaderboard"]/div[3]/table')
# out: []
What's the problem and how do I get the correct result?
CodePudding user response:
Because the webpage is not dynamic, you can find table in script block.
import requests
import json
from bs4 import BeautifulSoup
url = 'https://paperswithcode.com/sota/3d-object-detection-on-kitti-cars-moderate'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
table_json = json.loads(soup.find('script', {'id': 'evaluation-table-data'}).getText())
for row in table_json:
print(row['rank'], row['method'], row['metrics']['AP'], row['paper']['title'], row['evaluation_date'][:4])
OUTPUT:
1 BtcDet 82.86% Behind the Curtain: Learning Occluded Shapes for 3D Object Detection 2021
2 SE-SSD 82.54% SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud 2020
3 SPG 82.13 % SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation 2021
4 VoTr-TSD (ours) 82.09% Voxel Transformer for 3D Object Detection 2021
5 Pyramid-PV 82.08% Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection 2021
6 PV-RCNN 81.88% PV-RCNN : Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection 2021
7 M3DeTR 81.73 M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers 2021
8 Voxel R-CNN 81.62% Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection 2020
9 PV-RCNN 81.43% PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection 2019
10 SVGA-Net 80.82 % SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds 2020
11 CIA-SSD 80.28% CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud 2020
12 SA-SSD EBM 80.12% Accurate 3D Object Detection using Energy-Based Models 2020
13 PC-RGNN 79.9% PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection 2020
14 Joint 78.96% Joint 3D Instance Segmentation and Object Detection for Autonomous Driving 2020
15 STD 77.63% STD: Sparse-to-Dense 3D Object Detector for Point Cloud 2019
16 UberATG-MMF 76.75% Multi-Task Multi-Sensor Fusion for 3D Object Detection 2020
17 F-ConvNet 76.51% Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection 2019
18 PointRGCN 75.73% PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement 2019
19 PointRCNN 75.42% PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud 2018
20 PointPillars 74.99% PointPillars: Fast Encoders for Object Detection from Point Clouds 2018
21 PC-CNN-V2 73.80% A General Pipeline for 3D Detection of Vehicles 2018
22 RoarNet 73.04% RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement 2018
23 3D-FCT 72.79% 3D-FCT: Simultaneous 3D Object Detection and Tracking Using Feature Correlation 2021
24 IPOD 72.57% IPOD: Intensive Point-based Object Detector for Point Cloud 2018
25 AVOD Feature Pyramid 71.88% Joint 3D Proposal Generation and Object Detection from View Aggregation 2017
26 Frustum PointNets 70.39% Frustum PointNets for 3D Object Detection from RGB-D Data 2017
27 VoxelNet 65.11% VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection 2017
28 PGD 11.77% Probabilistic and Geometric Depth: Detecting Objects in Perspective 2021
CodePudding user response:
Because the webpage is dynamic. So you can apply bs4/pandas with selenium in this case.
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
driver.get('https://paperswithcode.com/sota/3d-object-detection-on-kitti-cars-moderate')
driver.maximize_window()
time.sleep(3)
html = driver.page_source
df = pd.read_html(html)[0]
print(df)
Output:
Rank Model AP ... Result Year Tags
0 1 BtcDet 82.86% ... Enter 2021 NaN
1 2 SE-SSD 82.54% ... Enter 2020 NaN
2 3 SPG 82.13 % ... Enter 2021 NaN
3 4 VoTr-TSD (ours) 82.09% ... Enter 2021 NaN
4 5 Pyramid-PV 82.08% ... Enter 2021 NaN
5 6 PV-RCNN 81.88% ... Enter 2021 NaN
6 7 M3DeTR 81.73 ... Enter 2021 NaN
7 8 Voxel R-CNN 81.62% ... Enter 2020 NaN
8 9 PV-RCNN 81.43% ... Enter 2019 NaN
9 10 SVGA-Net 80.82 % ... Enter 2020 NaN
10 11 CIA-SSD 80.28% ... Enter 2020 NaN
11 12 SA-SSD EBM 80.12% ... Enter 2020 NaN
12 13 PC-RGNN 79.9% ... Enter 2020 NaN
13 14 Joint 78.96% ... Enter 2020 NaN
14 15 STD 77.63% ... Enter 2019 NaN
15 16 UberATG-MMF 76.75% ... Enter 2020 NaN
16 17 F-ConvNet 76.51% ... Enter 2019 NaN
17 18 PointRGCN 75.73% ... Enter 2019 GCN
18 19 PointRCNN 75.42% ... Enter 2018 NaN
19 20 PointPillars 74.99% ... Enter 2018 NaN
20 21 PC-CNN-V2 73.80% ... Enter 2018 NaN
21 22 RoarNet 73.04% ... Enter 2018 NaN
22 23 3D-FCT 72.79% ... Enter 2021 NaN
23 24 IPOD 72.57% ... Enter 2018 NaN
24 25 AVOD Feature Pyramid 71.88% ... Enter 2017 NaN
25 26 Frustum PointNets 70.39% ... Enter 2017 NaN
26 27 VoxelNet 65.11% ... Enter 2017 NaN
27 28 PGD 11.77% ... Enter 2021 NaN
[28 rows x 8 columns]