How to read .data format in R?-CodePudding

I need to read this dataset https://psl.noaa.gov/data/correlation/tsa.data in R.

I tryed:

TSA <- read.table("https://psl.noaa.gov/data/correlation/tsa.data", fileEncoding="UTF-16", dec=";")

In this dataset each column represents the months from January to December (from left to right) and the rows the years.

I need to build a time series dataframe from it so I can analyze the data.

I imagine something as ideal if I could pivot the table to generate a dataframe (tsibble) like this:

YearMonth    TSA
1948 jan   -0.22
1948 feb   -0.12
1948 mar    0.04
1948 may   -0.21
 ...        ...
2021 sep    0.55

As I had difficulty reading this format in R, I think I'll get good recommendations for help around here.

CodePudding user response：

If you are willing to use the tidyverse, you could try something like this. Note I use skip = 1 to skip the first line and n_max = 74 to skip the last two. This might need to be adjusted for other datasets.

library(tidyverse)

read_table("https://psl.noaa.gov/data/correlation/tsa.data",
           skip = 1, 
           n_max = 74,
           col_names = FALSE) %>%
  set_names(c('year', month.abb)) %>%
  pivot_longer(-year, 
               names_to = 'month',
               values_to = 'TSA')
#> 
#> # A tibble: 888 x 3
#>     year month   TSA
#>    <dbl> <chr> <dbl>
#>  1  1948 Jan   -0.22
#>  2  1948 Feb   -0.12
#>  3  1948 Mar    0.04
#>  4  1948 Apr   -0.21
#>  5  1948 May   -0.4 
#>  6  1948 Jun   -0.64
#>  7  1948 Jul   -0.4 
#>  8  1948 Aug   -0.53
#>  9  1948 Sep   -0.41
#> 10  1948 Oct   -0.42
#> # ... with 878 more rows

^{Created on 2021-10-19 by the reprex package (v2.0.0)}