I need to read this dataset https://psl.noaa.gov/data/correlation/tsa.data in R.
I tryed:
TSA <- read.table("https://psl.noaa.gov/data/correlation/tsa.data", fileEncoding="UTF-16", dec=";")
In this dataset each column represents the months from January to December (from left to right) and the rows the years.
I need to build a time series dataframe from it so I can analyze the data.
I imagine something as ideal if I could pivot the table to generate a dataframe (tsibble) like this:
YearMonth TSA
1948 jan -0.22
1948 feb -0.12
1948 mar 0.04
1948 may -0.21
... ...
2021 sep 0.55
As I had difficulty reading this format in R, I think I'll get good recommendations for help around here.
CodePudding user response:
If you are willing to use the tidyverse
, you could try something like this. Note I use skip = 1
to skip the first line and n_max = 74
to skip the last two. This might need to be adjusted for other datasets.
library(tidyverse)
read_table("https://psl.noaa.gov/data/correlation/tsa.data",
skip = 1,
n_max = 74,
col_names = FALSE) %>%
set_names(c('year', month.abb)) %>%
pivot_longer(-year,
names_to = 'month',
values_to = 'TSA')
#>
#> # A tibble: 888 x 3
#> year month TSA
#> <dbl> <chr> <dbl>
#> 1 1948 Jan -0.22
#> 2 1948 Feb -0.12
#> 3 1948 Mar 0.04
#> 4 1948 Apr -0.21
#> 5 1948 May -0.4
#> 6 1948 Jun -0.64
#> 7 1948 Jul -0.4
#> 8 1948 Aug -0.53
#> 9 1948 Sep -0.41
#> 10 1948 Oct -0.42
#> # ... with 878 more rows
Created on 2021-10-19 by the reprex package (v2.0.0)