When using BeautifulSoup to scrap a table from https://egov.uscis.gov/processing-times/historic-pt, instead of getting the values that can bee seen in the content of the table, I get what seems to be a call from some sort of database:
table = webpage.select("table.records")
table
df = pd.read_html(str(table), na_values=0)[0]
df
Form Form Description Classification or Basis for Filing FY 2017 FY 2018 FY 2019 FY 2020 FY 2021 FY 20225
0 ${data.FORM_NAME} ${data.FORM_TITLE_EN} ${data.FORM_DESC_EN} ${data.FY14} ${data.FY15} ${data.FY16} ${data.FY17} ${data.NAT_AVG_MONTHS} ${data.FY22}
When I inspect the table with F12 I can see several <tr with the content that I wish to scrap; however, when I look at the source code, what I see is what I suspect is a call to a database:
<tbody >
<tr v-for="data in histFormsData">
<th scope="row" style="border-right:1px solid black; font-weight:bold">${data.FORM_NAME}</th>
<th scope="row">${data.FORM_TITLE_EN}</th>
<th scope="row" style="border-right:1px solid black">${data.FORM_DESC_EN}</th>
<td style="border-right:1px solid black; text-align:center">${data.FY14}</td>
<td style="border-right:1px solid black; border-right:1px solid black;text-align:center">${data.FY15}</td>
<td style="border-right:1px solid black; text-align:center">${data.FY16}</td>
<td style="border-right:1px solid black; text-align:center">${data.FY17}</td>
<td style="border-right:1px solid black; text-align:center">${data.NAT_AVG_MONTHS}</td>
<td style="text-align:center">${data.FY22}</td>
</tr>
</tbody>
This is the code I used to get request the webpage:
#Load webpage content
r = requests.get("https://egov.uscis.gov/processing-times/historic-pt")
#Convert to beautiful soup object
webpage = bs(r.content)
print(webpage.prettify())
What can I do to get the row content that can be seen in the page? I am new to web scraping and I was not able to find my question online.
Thanks in advance.
I tried importing the required packages, request the webpage, and use pandas to get the table:
#Import important packages
import requests # this one is for accessing webpages
from bs4 import BeautifulSoup as bs #scraping tool
import pandas as pd #pandas
#Load webpage content
r = requests.get("https://egov.uscis.gov/processing-times/historic-pt")
#Convert to beautiful soup object
webpage = bs(r.content)
print(webpage.prettify())
#Scraping table with pandas
table = webpage.select("table.records")
table
df = pd.read_html(str(table), na_values=0)[0]
df
CodePudding user response:
The data is loaded from different URL so beautifulsoup
doesn't see it, try:
import requests
import pandas as pd
url = "https://egov.uscis.gov/processing-times/historical-forms-data"
df = pd.DataFrame(requests.get(url, verify=False).json())
print(df)
Prints:
FORM_NUMBER FORM_NAME FORM_NAME_ES FORM_TITLE_EN FORM_TITLE_ES FORM_DESC_EN FORM_DESC_ES FY14 FY15 FY16 FY17 NAT_AVG_MONTHS FY22
0 I90 I-90 I-90 Application to Replace Permanent Resident Card Solicitud para Reemplazar Tarjeta de Residente Permanente Initial issuance, replacement or renewal Emisión inicial, reemplazo o renovación 6.8 8.0 7.8 8.3 5.2 1.2
1 I102 I-102 I-102 Application for Replacement/Initial Nonimmigrant Arrival/Departure Record Solicitud para Reemplazar o Registro Inicial de Entrada / Salida de No Inmigrante Initial issuance or replacement of a Form I-94 Emisión inicial o reemplazo de un Formulario I-94 4.9 3.9 3.3 3.9 4.0 7.8
2 I129 I-129 I-129 Petition for a Nonimmigrant Worker Petición de Trabajador No Inmigrante Nonimmigrant Petition (Premium filed) Petición de No Inmigrante (con Procesamiento Prioritario) 0.4 0.4 0.4 0.4 0.3 0.3
3 I129 I-129 I-129 Petition for a Nonimmigrant Worker Petición de Trabajador No Inmigrante Nonimmigrant Petition (non Premium filed) Petición de No Inmigrante (sin Procesamiento Prioritario) 3.4 3.8 4.7 2.3 1.8 2.3
4 I129F I-129F I-129F Petition for Alien Fiancé(e) Petición de Prometido(a) Extranjero(a) All Classifications Todas las Clasificaciones 3.6 6.5 5.2 4.6 8.0 12.1
5 I130 I-130 I-130 Petition for Alien Relative Petición de Familiar Extranjero Immediate Relative Familiar Inmediato 6.5 7.6 8.6 8.3 10.2 10.3
6 I131 I-131 I-131 Application for Travel Document Solicitud de Documento de Viaje Advance Parole Document Documento de Permiso Adelantado 3.0 3.6 4.5 4.6 7.7 7.3
7 I131 I-131 I-131 Application for Travel Document Solicitud de Documento de Viaje Parole in Place Permiso de Permanencia en el País 2.5 3.3 3.3 4.8 4.9 4.7
8 I131 I-131 I-131 Application for Travel Document Solicitud de Documento de Viaje Travel Document Documento de Viaje 4.2 2.9 2.8 4.0 7.2 10.6
9 I140 I-140 I-140 Immigrant Petition for Alien Workers Petición de Trabajador Inmigrante Extranjero Immigrant Petition (Premium filed) Petición de Inmigrante (con Procesamiento Prioritario) 0.4 0.3 0.3 0.3 0.4 0.3
10 I140 I-140 I-140 Immigrant Petition for Alien Workers Petición de Trabajador Inmigrante Extranjero Immigrant Petition (non Premium filed) Petición de Inmigrante (sin Procesamiento Prioritario) 7.3 8.9 5.8 4.9 8.2 9.3
...