Home > Net >  How to Scrape multi page website using R language
How to Scrape multi page website using R language

Time:11-09

I want to scrape contents of multi page website using R, currently I'm able to scrape the first page, How do I scrape all pages and store them in csv.

Here;s my code so far

library(rvest)
library(tibble)
library(tidyr)
library(dplyr)
df = 'https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p=1&selectedItem=viewAllAwardedContracts.do&T01_ps=100' %>% 
  read_html() %>% html_table()
df
write.csv(df,"Contracts_test_taneps.csv")

CodePudding user response:

Scrape multiple pages. Change 1:2 to 1:YOU NUMBER

library(tidyverse)
library(rvest)

get_taneps <- function(page) {
  str_c("https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p=", 
        page, "&selectedItem=viewAllAwardedContracts.do&T01_ps=100") %>% 
    read_html() %>% 
    html_table() %>% 
    getElement(1) %>% 
    janitor::clean_names()
}

map_dfr(1:2, get_taneps)

# A tibble: 200 x 7
   tender_no                                             procuring_entity                suppl~1 award~2 award~3 lot_n~4 notic~5
   <chr>                                                 <chr>                           <chr>   <chr>   <chr>   <chr>   <lgl>  
 1 AE/005/2022-2023/MOROGORO/FA/G/01                     Morogoro Municipal Council      SHIBAM~ 08/11/~ "66200~ N/A     NA     
 2 AE/005/2022-2023/DODOMA/FA/NC/02                      Ministry of Livestock and Fish~ NINO G~ 04/11/~ "46511~ N/A     NA     
 3 LGA/014/2022/2023/G/01 UTAWALA                        Bagamoyo District Council       VILANG~ 02/11/~ "90000~ N/A     NA     
 4 LGA/014/014/2022/2023/G/01 FEDHA 3EPICAR              Bagamoyo District Council       VILANG~ 02/11/~ "88100~ N/A     NA     
 5 LGA/014/2022/2023/G/01/ARDHI                          Bagamoyo District Council       VILANG~ 31/10/~ "16088~ N/A     NA     
 6 LGA/014/2022/2023/G/11 VIFAA VYA USAFI SOKO LA SAMAKI Bagamoyo District Council       MBUTUL~ 31/10/~ "10000~ N/A     NA     
 7 DCD - 000899- 400E - ANIMAL FEEDS                     Kibaha Education Centre         ALOYCE~ 29/10/~ "82400~ N/A     NA     
 8 AE/005/2022-2023/MOROGORO/FA/G/01                     Morogoro Regional Referral Hos~ JIGABH~ 02/11/~ "17950~ N/A     NA     
 9 IE/023/2022-23/HQ/G/13                                Commission for Mediation and A~ AKO GR~ 27/10/~ "42500~ N/A     NA     
10 AE/005/2022-2023/MOROGORO/FA/G/05                     Morogoro Municipal Council      THE GR~ 01/11/~ "17247~ N/A     NA     
# ... with 190 more rows, and abbreviated variable names 1: supplier_name, 2: award_date, 3: award_amount, 4: lot_name,
#   5: notice_pdf
# i Use `print(n = ...)` to see more rows

Write as .csv

write_csv(df, "Contracts_test_taneps.csv")
  • Related