Home > Enterprise >  Loop URL in SDMX format and combine the result into R dataframe
Loop URL in SDMX format and combine the result into R dataframe

Time:10-10

I'm trying to download World Bank's WITS data and convert them into R dataframe. The site's API (ref. pp. 12-13) does not seem to allow users to call all "reporters," "partners," and "products" at once, so one may need to loop through a list of country (either as "reporters" or "partners") using their XML request format:

http://wits.worldbank.org/API/V1/SDMX/V21/datasource/tradestats-tariff/reporter/usa/year/2000/partner/all/product/all/indicator/AHS-WGHTD-AVRG

(One will need to loop through the "usa" part by a list of "reporters" (country abbreviation)) My goal is to loop through a list a country abbreviation, generate a dataframe for each run, and then bind them together into a larger dataframe. So I referenced this post and use the following code, but it didn't get me any further. I posted my code at below and it will be really appreciated if someone could take a look at and share some tips on this.

# load required packages
library(RCurl)
library(XML)
devtools::install_github("opensdmx/rsdmx")
library(rsdmx)

# if just for one reporter (usa)
myUrl <- "http://wits.worldbank.org/API/V1/SDMX/V21/datasource/tradestats-tariff/reporter/usa/year/2000/partner/all/product/all/indicator/AHS-WGHTD-AVRG"

dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)

dim(stats)
[1] 5142    8

# looks like this
head(stats)
  FREQ REPORTER PARTNER     PRODUCTCODE
1    A      USA     ABW    01-05_Animal
2    A      USA     ABW 06-15_Vegetable
3    A      USA     ABW  16-24_FoodProd

## loop through a list of reporters (countries)

library(rvest)

# teams
reporters <- c("aus", "usa", "ukr")

# init
df <- data.frame()

# loop
for(i in reporters){
    # find url
    myUrl <- paste0("http://wits.worldbank.org/API/V1/SDMX/V21/datasource/tradestats-tariff/reporter/", i,"/year/2000/partner/all/product/all/indicator/AHS-WGHTD-AVRG")
    dataset <- readSDMX(myUrl)
    stats <- as.data.frame(dataset)
    # bind to dataframe
    df <- rbind(df, stats)
}

# view captured data
View(df)

# NOTHING!

CodePudding user response:

Consider R's built-in utils::download.file() and then parse with XML. Because your data is attribute-centric with no element text in <Series> and <Obs> nodes, consider the undocumented xmlAttrsToDataFrame, requiring triple colon qualifier, :::.

Finally, use an apply function like sapply and avoid the bookkeeping of for loops and inefficiently growing an object in a loop by calling rbind iteratively on same dataframe. Below even wraps download and XML parsing in tryCatch for potential errors like ukr.

library(XML)

build_df <- function(i) {
  url <- paste0(
    "http://wits.worldbank.org/API/V1/SDMX/V21/datasource/",
    "tradestats-tariff/reporter/", i,
    "/year/2000/partner/all/product/all/indicator/AHS-WGHTD-AVRG"
  )
  
  df <- NULL
  tryCatch({
    tmp <- tempfile()          # DOWNLOAD XML TO TEMP FILE
    download.file(url, tmp)
    
    # PARSE XML AND EXTRACT ATTRIBUTES TO DATA FRAME
    doc <- XML::xmlParse(tmp)
    df <- cbind(
      XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//Series")),
      XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//Obs"))
    )
    unlink(tmp)               # DELETE TEMP FILE
  }, warning = function(w) print(w)
  , error = function(e) print(e)
  )
  
  return(df)
}

reporters <- c("aus", "usa", "ukr")

# NAMED LIST OF DATA FRAMES
df_list <- sapply(reporters, build_df)

# COMPILE ALL INTO SINGLE DATA FRAME (CALL rbind ONCE)
final_df <- do.call(rbind, unname(df_list))

Output

df_list

head(df_list$aus)
  FREQ REPORTER PARTNER     PRODUCTCODE      INDICATOR TIME_PERIOD        OBS_VALUE DATASOURCE
1    A      AUS     AFG 50-63_TextCloth AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
2    A      AUS     AFG           manuf AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
3    A      AUS     AFG        Textiles AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
4    A      AUS     AFG           Total AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
5    A      AUS     AFG     UNCTAD-SoP3 AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
6    A      AUS     AGO 41-43_HidesSkin AHS-WGHTD-AVRG        2000               20   WITS-TRN

head(df_list$usa)
  FREQ REPORTER PARTNER     PRODUCTCODE      INDICATOR TIME_PERIOD OBS_VALUE DATASOURCE
1    A      USA     ABW    01-05_Animal AHS-WGHTD-AVRG        2000         0   WITS-TRN
2    A      USA     ABW 06-15_Vegetable AHS-WGHTD-AVRG        2000         0   WITS-TRN
3    A      USA     ABW  16-24_FoodProd AHS-WGHTD-AVRG        2000         0   WITS-TRN
4    A      USA     ABW     27-27_Fuels AHS-WGHTD-AVRG        2000         0   WITS-TRN
5    A      USA     ABW 28-38_Chemicals AHS-WGHTD-AVRG        2000         0   WITS-TRN
6    A      USA     ABW 39-40_PlastiRub AHS-WGHTD-AVRG        2000         0   WITS-TRN

head(df_list$ukr)
NULL

final_df

head(final_df)
  FREQ REPORTER PARTNER     PRODUCTCODE      INDICATOR TIME_PERIOD        OBS_VALUE DATASOURCE
1    A      AUS     AFG 50-63_TextCloth AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
2    A      AUS     AFG           manuf AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
3    A      AUS     AFG        Textiles AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
4    A      AUS     AFG           Total AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
5    A      AUS     AFG     UNCTAD-SoP3 AHS-WGHTD-AVRG        2000 0.46377738685431   WITS-TRN
6    A      AUS     AGO 41-43_HidesSkin AHS-WGHTD-AVRG        2000               20   WITS-TRN

tail(final_df)
     FREQ REPORTER PARTNER PRODUCTCODE      INDICATOR TIME_PERIOD          OBS_VALUE DATASOURCE
8966    A      USA     ZWE       Total AHS-WGHTD-AVRG        2000   5.65257483778078   WITS-TRN
8967    A      USA     ZWE      Transp AHS-WGHTD-AVRG        2000  0.249632882835876   WITS-TRN
8968    A      USA     ZWE UNCTAD-SoP1 AHS-WGHTD-AVRG        2000   29.5531507778593   WITS-TRN
8969    A      USA     ZWE UNCTAD-SoP2 AHS-WGHTD-AVRG        2000    2.7243509937362   WITS-TRN
8970    A      USA     ZWE UNCTAD-SoP3 AHS-WGHTD-AVRG        2000   5.12801568054237   WITS-TRN
8971    A      USA     ZWE UNCTAD-SoP4 AHS-WGHTD-AVRG        2000 0.0082396181212962   WITS-TRN
  • Related