I am in very unfamiliar terrain with downloading this file set from the Federal Reserve's website (I was hoping for an API but it doesn't appear to exist).
I am not sure what this Fed link format really is and how I would read the data into R.
I tried solutions here and here but I get different errors and not sure where to go. When I try something like this solution:
library(XML)
tf <- tempfile(tmpdir = tdir <- tempdir())
## Download the zip file
download.file("https://www.federalreserve.gov/datadownload/Output.aspx?rel=Z1&filetype=zip", tf)
## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)
## Parse the first file
doc <- xmlInternalTreeParse(xml_files[1])
I get Warning message: In unzip(tf, exdir = tdir) : zip file is corrupt
So basically, I would like to be able to get the data without having to first go to the website, download the excel version, and then read the excel version into R.
I am assuming there is a way to just reference this link https://www.federalreserve.gov/datadownload/Output.aspx?rel=Z1&filetype=zip to read the data into R without the manual steps associated with first downloading in excel.
The link for different download options for this data is here: https://www.federalreserve.gov/datadownload/Choose.aspx?rel=Z1
Thanks!
CodePudding user response:
Interestingly, I did not encounter the unzip
warning on Linux but did on Windows and one notable difference of operating systems is character encoding.
Therefore, consider the non-default binary mode (i.e., mode = "wb"
) of download.file
which does not raise a warning on Windows. This also makes sense for XML files which usually carry non-ASCII encoding such as UTF-8
.
## Download the zip file
url <- "https://www.federalreserve.gov/datadownload/Output.aspx?rel=Z1&filetype=zip"
download.file(url, tf, mode = "wb")
## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)
In review, the Z1_data XML looks to be fairly flat and attribute-centric as shown in the head of the document with data observations nested under each series:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<message:MessageGroup xmlns:message="http://www.SDMX.org/resources/SDMXML/schemas/v1_0/message" xmlns:common="http://www.SDMX.org/resources/SDMXML/schemas/v1_0/common" xmlns:frb="http://www.federalreserve.gov/structure/compact/common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.SDMX.org/resources/SDMXML/schemas/v1_0/message SDMXMessage.xsd http://www.federalreserve.gov/structure/compact/common frb_common.xsd"><message:Header xmlns:message="http://www.SDMX.org/resources/SDMXML/schemas/v1_0/message">
<message:ID>OTHER</message:ID>
<message:Test>false</message:Test>
<message:Name>Z.1 Statistical Release</message:Name>
<message:Prepared>2022-06-08T23:46:08</message:Prepared>
<message:Sender id="FRB">
<message:Name>Federal Reserve Board</message:Name>
<message:Contact>
<message:Name>Financial Accounts of the United States Z.1 Coordinator</message:Name>
<message:Telephone>(202)-452-3000</message:Telephone>
</message:Contact>
</message:Sender>
</message:Header>
<frb:DataSet id="Z1" xmlns:kf="http://www.federalreserve.gov/structure/compact/Z1_Z1" xsi:schemaLocation="http://www.federalreserve.gov/structure/compact/Z1_Z1 Z1_Z1.xsd" >
<kf:Series CURRENCY="USD" FREQ="203" SERIES_INSTRUMENT="60100" SERIES_NAME="FA086010005.A" SERIES_PREFIX="FA" SERIES_SECTOR="08" SERIES_TYPE="5" UNIT="Currency" UNIT_MULT="1000000" >
<frb:Annotations>
<common:Annotation>
<common:AnnotationType>Short Description</common:AnnotationType>
<common:AnnotationText>U.S. national income </common:AnnotationText>
</common:Annotation>
<common:Annotation>
<common:AnnotationType>Long Description</common:AnnotationType>
<common:AnnotationText>U.S. national income </common:AnnotationText>
</common:Annotation>
</frb:Annotations>
<frb:Obs OBS_STATUS="A" OBS_VALUE="201261" TIME_PERIOD="1946-12-31" />
<frb:Obs OBS_STATUS="A" OBS_VALUE="218742" TIME_PERIOD="1947-12-31" />
<frb:Obs OBS_STATUS="A" OBS_VALUE="244849" TIME_PERIOD="1948-12-31" />
<frb:Obs OBS_STATUS="A" OBS_VALUE="239729" TIME_PERIOD="1949-12-31" />
<frb:Obs OBS_STATUS="A" OBS_VALUE="266613" TIME_PERIOD="1950-12-31" />
...
Therefore, consider using the undocumented, xmlAttrsToDataFrame
, and assign namespace prefixes for XPath parsing, to build data frames of all series and select observations:
## Parse the first file
doc <- xmlInternalTreeParse(xml_files[1])
nmsp <- c(
kf="http://www.federalreserve.gov/structure/compact/Z1_Z1",
frb="http://www.federalreserve.gov/structure/compact/common"
)
# RETURN ALL SERIES IN DATA
data_series <- XML:::xmlAttrsToDataFrame(
getNodeSet(doc, path="//kf:Series", namespaces=nmsp)
)
str(data_series)
# 'data.frame': 23039 obs. of 9 variables:
# $ CURRENCY : chr "USD" "USD" "USD" "USD" ...
# $ FREQ : chr "203" "162" "203" "162" ...
# $ SERIES_INSTRUMENT: chr "60100" "60100" "61300" "61300" ...
# $ SERIES_NAME : chr "FA086010005.A" "FA086010005.Q" "FA086130003.A" "FA086130003.Q" ...
# $ SERIES_PREFIX : chr "FA" "FA" "FA" "FA" ...
# $ SERIES_SECTOR : chr "08" "08" "08" "08" ...
# $ SERIES_TYPE : chr "5" "5" "3" "3" ...
# $ UNIT : chr "Currency" "Currency" "Currency" "Currency" ...
# $ UNIT_MULT : chr "1000000" "1000000" "1000000" "1000000" ...
# RETURN OBSERVATIONS OF A SINGLE SERIES
FA086010005.A <- XML:::xmlAttrsToDataFrame(
getNodeSet(doc, path="//kf:Series[@SERIES_NAME='FA086010005.A']/frb:Obs", namespaces=nmsp)
)
head(FA086010005.A, 10)
# OBS_STATUS OBS_VALUE TIME_PERIOD
# 1 A 201261 1946-12-31
# 2 A 218742 1947-12-31
# 3 A 244849 1948-12-31
# 4 A 239729 1949-12-31
# 5 A 266613 1950-12-31
# 6 A 307637 1951-12-31
# 7 A 326112 1952-12-31
# 8 A 343840 1953-12-31
# 9 A 343753 1954-12-31
# 10 A 376903 1955-12-31
tail(FA086010005.A, 10)
# OBS_STATUS OBS_VALUE TIME_PERIOD
# 67 A 14099598 2012-12-31
# 68 A 14507115 2013-12-31
# 69 A 15228067 2014-12-31
# 70 A 15749523 2015-12-31
# 71 A 16033408 2016-12-31
# 72 A 16774920 2017-12-31
# 73 A 17673340 2018-12-31
# 74 A 18273065 2019-12-31
# 75 A 17710721 2020-12-31
# 76 A 19937976 2021-12-31
CodePudding user response:
I suspect this should work:
tf <- tempfile(tmpdir = tdir <- tempdir())
## Download the zip file
download.file("https://www.federalreserve.gov/datadownload/Output.aspx?rel=Z1&filetype=zip", tf)
## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)
library(rvest)
doc <- read_html(xml_files[1])
(I'm trying it now, since the file Z1_data.xml
is 526mb it takes some time to parse, but I can't see any reason it shouldn't work)