I have a file without any extension, which contain gridded daily temp_max(deg_c). The first two rows from the fourth column are indicates longitude and latitude respectively for each grid. Whereas First three column from 3rd row indicates year-month-day. And from the fourth column data are Tmax (Deg_c). For example please refer to following figure
CodePudding user response:
You can:
- use
fread
fromdata.table
to quickly import this large dataset - move latitude and longitude to the column names
- use
make_date
fromlubridate
to convert year, months and days into a single date column - use
select
andslice
fromdplyr
to remove useless rows and columns - reshape your dataset to a longer format with
pivot_wider
(dplyr
)
library(dplyr)
library(lubridate)
library(tidyr)
dat <- data.table::fread("yourpath/TMaxData")
colnames(dat)[-c(1, 2, 3)] <- as.character(dat[1:2, -c(1, 2, 3)])
dat |>
mutate(date = make_date(V1, V2, V3)) |>
select(-c(V1, V2, V3)) |>
relocate(date) |>
slice(-c(1, 2)) |>
as_tibble() |>
pivot_longer(-date, names_to = "long_lat", values_to = "Tmax") |>
separate(long_lat, into = c("long", "lat"), sep = ", ") |>
mutate(long = gsub("c\\(", "", long) |> as.numeric(),
lat = gsub("\\)", "", lat) |> as.numeric())
#> # A tibble: 1,402,560 × 4
#> date long lat Tmax
#> <date> <dbl> <dbl> <dbl>
#> 1 1951-01-01 88.9 27.1 14.4
#> 2 1951-01-01 89.1 26.9 20.8
#> 3 1951-01-01 89.1 27.1 12.6
#> 4 1951-01-01 89.1 27.4 3.37
#> 5 1951-01-01 89.1 27.6 -0.35
#> 6 1951-01-01 89.4 26.9 20.6
#> 7 1951-01-01 89.4 27.1 9.55
#> 8 1951-01-01 89.4 27.4 7.23
#> 9 1951-01-01 89.4 27.6 1.93
#> 10 1951-01-01 89.4 27.9 -1.17
#> # … with 1,402,550 more rows
Created on 2022-07-08 by the reprex package (v2.0.1)
I'm sure there are neater solutions - I'm not familiar with raster time series data. Hopefully, this will be helpful.