How to read time series climate data (i.e., precipitation) without extension in R?-CodePudding

I have a file without any extension, which contain gridded daily temp_max(deg_c). The first two rows from the fourth column are indicates longitude and latitude respectively for each grid. Whereas First three column from 3rd row indicates year-month-day. And from the fourth column data are Tmax (Deg_c). For example please refer to following figure

CodePudding user response：

You can:

use fread from data.table to quickly import this large dataset
move latitude and longitude to the column names
use make_date from lubridate to convert year, months and days into a single date column
use select and slice from dplyr to remove useless rows and columns
reshape your dataset to a longer format with pivot_wider (dplyr)

library(dplyr)
library(lubridate)
library(tidyr)

dat <- data.table::fread("yourpath/TMaxData")

colnames(dat)[-c(1, 2, 3)] <- as.character(dat[1:2, -c(1, 2, 3)])

dat |> 
  mutate(date = make_date(V1, V2, V3)) |> 
  select(-c(V1, V2, V3)) |> 
  relocate(date) |> 
  slice(-c(1, 2)) |>
  as_tibble() |>
  pivot_longer(-date, names_to = "long_lat", values_to = "Tmax") |> 
  separate(long_lat, into = c("long", "lat"), sep = ", ") |> 
  mutate(long = gsub("c\\(", "", long) |> as.numeric(),
         lat = gsub("\\)", "", lat) |> as.numeric())

#> # A tibble: 1,402,560 × 4
#>    date        long   lat  Tmax
#>    <date>     <dbl> <dbl> <dbl>
#>  1 1951-01-01  88.9  27.1 14.4 
#>  2 1951-01-01  89.1  26.9 20.8 
#>  3 1951-01-01  89.1  27.1 12.6 
#>  4 1951-01-01  89.1  27.4  3.37
#>  5 1951-01-01  89.1  27.6 -0.35
#>  6 1951-01-01  89.4  26.9 20.6 
#>  7 1951-01-01  89.4  27.1  9.55
#>  8 1951-01-01  89.4  27.4  7.23
#>  9 1951-01-01  89.4  27.6  1.93
#> 10 1951-01-01  89.4  27.9 -1.17
#> # … with 1,402,550 more rows

^{Created on 2022-07-08 by the reprex package (v2.0.1)}

I'm sure there are neater solutions - I'm not familiar with raster time series data. Hopefully, this will be helpful.