I have a timeseries of geotiff files I'd like to stack in R using stars. Here's the first two:
urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
library(stars)
stars::read_stars(urls, along="time")
Errors with:
Error in c.stars_proxy(`3` = list(gep01.t00z.pgrb2a.0p50.f003.tif = "/vsicurl/https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/cogs/gefs.20221201/gep01.t00z.pgrb2a.0p50.f003.tif"), :
don't know how to merge arrays: please specify parameter along
Context: bands contain both time band info
This fails because the dimensions do not match, which happens because the files have concatenated temporal information into the band names:
x<- lapply(urls, read_stars)
x
produces:
[[1]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e 05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max.
gep01.t00z.pgrb2a.0p50.f003.ti... 50026.01 98094.81 101138 98347.42 101845.2 104605.2
dimension(s):
from to offset delta refsys point
x 1 720 -180.25 0.5 Coordinate System importe... FALSE
y 1 361 90.25 -0.5 Coordinate System importe... FALSE
band 1 8 NA NA NA NA
values x/y
x NULL [x]
y NULL [y]
band PRES:surface:3 hour fcst,...,DLWRF:surface:0-3 hour ave fcst
[[2]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e 05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max.
gep01.t00z.pgrb2a.0p50.f006.ti... 50029.83 98101.83 101170.6 98337.52 101825 104588.2
dimension(s):
from to offset delta refsys point
x 1 720 -180.25 0.5 Coordinate System importe... FALSE
y 1 361 90.25 -0.5 Coordinate System importe... FALSE
band 1 8 NA NA NA NA
values x/y
x NULL [x]
y NULL [y]
band PRES:surface:6 hour fcst,...,DLWRF:surface:0-6 hour ave fcst
Note the band names would align except for the existence of the timestamp being tacked on, e.g. PRES:surface:3 hour fcst
vs PRES:surface:6 hour fcst
.
How can I best read in these files so that I have dimensions of x,y,band, and time in my stars object?
alternatives: terra?
How about terra
? Note that terra
is happy to read these files in directly, but treats this as 16 unique bands. Can I re-align that so that I have the original 8 bands along a new "time" dimension? (I recognize stars emphasizes 'spatio-temporal', maybe the such a cube is out of scope to terra?) Also note that terra for some reason mangles the timestamp in these band names:
x <- terra::rast(urls)
x
class : SpatRaster
dimensions : 361, 720, 16 (nrow, ncol, nlyr)
resolution : 0.5, 0.5 (x, y)
extent : -180.25, 179.75, -90.25, 90.25 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat Coordinate System imported from GRIB file
sources : gep01.t00z.pgrb2a.0p50.f003.tif (8 layers)
gep01.t00z.pgrb2a.0p50.f006.tif (8 layers)
names : PRES:~ fcst, TMP:2~ fcst, RH:2 ~ fcst, UGRD:~ fcst, VGRD:~ fcst, APCP:~ fcst, .
CodePudding user response:
Okay, this isn't particularly elegant but seems to get the job done:
urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
library(stars)
#stars::read_stars(urls, along="time") # no luck!
## grab unstacked proxy object for each geotiff
x <- lapply(urls, read_stars)
# extract band-names-part
band_names <- st_get_dimension_values(x[[1]], "band") |>
stringr::str_extract("([A-Z] ):") |>
str_remove(":")
# apply corrected band-names
x1 <- lapply(x, st_set_dimensions, "band", band_names)
# at last, we can stack into a cube:
x1 <- do.call(c, c(x1, along="time"))
# and add correct date timestamps to the new time dimension
dates <- as.Date("2022-12-01") lubridate::hours(c(3,6))
x1 <- st_set_dimensions(x1, "time", dates)
x1
CodePudding user response:
With terra it is pretty easy to make a time-series for each variable as I show below.
urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
library(terra)
r <- rast(urls)
Extract two variables of interest
nms <- names(r)
tmp <- r[[grep("TMP", nms)]]
rh <- r[[grep("RH", nms)]]
# set time
tm <- as.POSIXct("2022-12-01", tz="GMT") c(3,6) * 3600
time(rh) <- tm
time(tmp) <- tm
And you could combine them into a SpatRasterDatset like this:
s <- sds(list(tmp=tmp, rh=rh))
An alternative path to get to the same point would be to start with a SpatRasterDataset and subset it.
sd <- sds(urls)
nl <- 1:length(sd)
nms <- names(sd[1])
tmp2 <- rast(sd[nl, grep("TMP", nms)])
time(tmp2) <- tm
rh2 <- rast(sd[nl, grep("RH", nms)])
time(rh2) <- tm
I made the subsetting work a little nicer in terra version 1.7-5
urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/", c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
library(terra)
#terra 1.7.5
sd <- sds(urls)
tmp <- sd[,2]
tmp
#class : SpatRaster
#dimensions : 361, 720, 2 (nrow, ncol, nlyr)
#resolution : 0.5, 0.5 (x, y)
#extent : -180.25, 179.75, -90.25, 90.25 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat Coordinate System imported from GRIB file
#sources : gep01.t00z.pgrb2a.0p50.f003.tif
# gep01.t00z.pgrb2a.0p50.f006.tif
#names : TMP:2 m above g~Temperature [C], TMP:2 m above g~Temperature [C]
#unit : C, C
#time : 2022-12-01 03:00:00 to 2022-12-01 06:00:00 UTC
As for the layer names containing the forecast time, that is just because that is what is in the tif metadata. It looks like that was a decision made when they were created from the original GRIB files.
The latitude extent going beyond the north and south poles is an interesting feature of this dataset.