Home > Blockchain >  stack geotiff with stars 'along' when 'band' dimension contains band time info
stack geotiff with stars 'along' when 'band' dimension contains band time info

Time:01-27

I have a timeseries of geotiff files I'd like to stack in R using stars. Here's the first two:

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(stars)
stars::read_stars(urls, along="time")

Errors with:

Error in c.stars_proxy(`3` = list(gep01.t00z.pgrb2a.0p50.f003.tif = "/vsicurl/https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/cogs/gefs.20221201/gep01.t00z.pgrb2a.0p50.f003.tif"),  : 
  don't know how to merge arrays: please specify parameter along

Context: bands contain both time band info

This fails because the dimensions do not match, which happens because the files have concatenated temporal information into the band names:

x<- lapply(urls, read_stars)
x

produces:

[[1]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e 05 cells:
                                       Min.  1st Qu. Median     Mean  3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f003.ti...  50026.01 98094.81 101138 98347.42 101845.2 104605.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:3 hour fcst,...,DLWRF:surface:0-3 hour ave fcst    

[[2]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e 05 cells:
                                       Min.  1st Qu.   Median     Mean 3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f006.ti...  50029.83 98101.83 101170.6 98337.52  101825 104588.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:6 hour fcst,...,DLWRF:surface:0-6 hour ave fcst    

Note the band names would align except for the existence of the timestamp being tacked on, e.g. PRES:surface:3 hour fcst vs PRES:surface:6 hour fcst.

How can I best read in these files so that I have dimensions of x,y,band, and time in my stars object?

alternatives: terra?

How about terra? Note that terra is happy to read these files in directly, but treats this as 16 unique bands. Can I re-align that so that I have the original 8 bands along a new "time" dimension? (I recognize stars emphasizes 'spatio-temporal', maybe the such a cube is out of scope to terra?) Also note that terra for some reason mangles the timestamp in these band names:

x <- terra::rast(urls)
x
class       : SpatRaster 
dimensions  : 361, 720, 16  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat Coordinate System imported from GRIB file 
sources     : gep01.t00z.pgrb2a.0p50.f003.tif  (8 layers) 
              gep01.t00z.pgrb2a.0p50.f006.tif  (8 layers) 
names       : PRES:~ fcst, TMP:2~ fcst, RH:2 ~ fcst, UGRD:~ fcst, VGRD:~ fcst, APCP:~ fcst, .

CodePudding user response:

Okay, this isn't particularly elegant but seems to get the job done:

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(stars)
#stars::read_stars(urls, along="time") # no luck!


## grab unstacked proxy object for each geotiff
x <- lapply(urls, read_stars)

# extract band-names-part
band_names <- st_get_dimension_values(x[[1]], "band") |> 
  stringr::str_extract("([A-Z] ):") |>
  str_remove(":")
# apply corrected band-names
x1 <- lapply(x, st_set_dimensions, "band", band_names)

# at last, we can stack into a cube:
x1 <- do.call(c, c(x1, along="time"))

# and add correct date timestamps to the new time dimension
dates <- as.Date("2022-12-01")   lubridate::hours(c(3,6))
x1 <- st_set_dimensions(x1, "time", dates)
x1

CodePudding user response:

With terra it is pretty easy to make a time-series for each variable as I show below.

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(terra)
r <- rast(urls)

Extract two variables of interest

nms <- names(r)
tmp <- r[[grep("TMP", nms)]]
rh <- r[[grep("RH", nms)]]

# set time
tm <- as.POSIXct("2022-12-01", tz="GMT")   c(3,6) * 3600
time(rh) <- tm 
time(tmp) <- tm

And you could combine them into a SpatRasterDatset like this:

s <- sds(list(tmp=tmp, rh=rh))

An alternative path to get to the same point would be to start with a SpatRasterDataset and subset it.

sd <- sds(urls)
nl <- 1:length(sd)
nms <- names(sd[1])

tmp2 <- rast(sd[nl, grep("TMP", nms)])
time(tmp2) <- tm

rh2 <- rast(sd[nl, grep("RH", nms)])
time(rh2) <- tm

I made the subsetting work a little nicer in terra version 1.7-5

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/", c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(terra)
#terra 1.7.5
sd <- sds(urls)
tmp <- sd[,2]

tmp
#class       : SpatRaster 
#dimensions  : 361, 720, 2  (nrow, ncol, nlyr)
#resolution  : 0.5, 0.5  (x, y)
#extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat Coordinate System imported from GRIB file 
#sources     : gep01.t00z.pgrb2a.0p50.f003.tif  
#              gep01.t00z.pgrb2a.0p50.f006.tif  
#names       : TMP:2 m above g~Temperature [C], TMP:2 m above g~Temperature [C] 
#unit        :                               C,                               C 
#time        : 2022-12-01 03:00:00 to 2022-12-01 06:00:00 UTC 

As for the layer names containing the forecast time, that is just because that is what is in the tif metadata. It looks like that was a decision made when they were created from the original GRIB files.

The latitude extent going beyond the north and south poles is an interesting feature of this dataset.

  • Related