Home > database >  Is it possible to write a single function to create raster files from a data.frame object?
Is it possible to write a single function to create raster files from a data.frame object?

Time:12-15

I have loaded the data.frame object in R named "prec" with 1009549 rows and 8 variables. I want to create 60 raster layers of the cumulative "prec" variable values for each x-y coordinates pair at every 4-time step ("tstep" variable, from index 2 to 241) as summarized in the code below. I performed a single function to create each file in 3 steps to achieve it. However, is it possible to write a single function for each step or a single function for the entire code (steps 1 to 4)?

load required packages

library(data.table)
library(raster)

structure of the "prec" data.frame

> headTail(prec)

             x     y prec index tstep variable level                date
1        -47.8 -21.2    0     1     1     prec  1000 2015-01-01 00:00:00
1.1      -47.6 -21.2    0     1     1     prec  1000 2015-01-01 00:00:00
1.2      -47.4 -21.2    0     1     1     prec  1000 2015-01-01 00:00:00
1.3      -47.2 -21.2    0     1     1     prec  1000 2015-01-01 00:00:00
...        ...   ...  ...   ...   ...     <NA>   ...                <NA>
241.4185 -36.8  -7.2    0   241   241     prec  1000 2015-01-01 00:00:59
241.4186 -36.6  -7.2    0   241   241     prec  1000 2015-01-01 00:00:59
241.4187 -36.4  -7.2    0   241   241     prec  1000 2015-01-01 00:00:59
241.4188 -36.2  -7.2    0   241   241     prec  1000 2015-01-01 00:01:00

step 1: subset by tstep

prec_1 <- prec[prec$tstep %in% c(2, 3, 4, 5),]
prec_2 <- prec[prec$tstep %in% c(6, 7, 8, 9),]
prec_3 <- prec[prec$tstep %in% c(10, 11, 12, 13),]
...
prec_60 <- prec[prec$tstep %in% c( 238 , 239 , 240 , 241),]

step 2: coerce to data.table

prec_1_sum <- setDT(prec_1)[, list(prec_sum_1 = sum(prec*1000)), list(x, y)]
prec_2_sum <- setDT(prec_2)[, list(prec_sum_2 = sum(prec*1000)), list(x, y)]
prec_3_sum <- setDT(prec_3)[, list(prec_sum_3 = sum(prec*1000)), list(x, y)]
...
prec_60_sum <- setDT(prec_60)[, list(prec_sum_60 = sum(prec*1000)), list(x, y)]

step 3: create n raster layers

layer_1 <- rasterFromXYZ(prec_1_sum [,1:3], res = c(0.20, 0.20), crs = sp::CRS(" init=epsg:4326"))
layer_2 <- rasterFromXYZ(prec_2_sum [,1:3], res = c(0.20, 0.20), crs = sp::CRS(" init=epsg:4326"))
layer_3 <- rasterFromXYZ(prec_3_sum [,1:3], res = c(0.20, 0.20), crs = sp::CRS(" init=epsg:4326"))
...
layer_60 <- rasterFromXYZ(prec_60_sum [,1:3], res = c(0.20, 0.20), crs = sp::CRS(" init=epsg:4326"))

step 4: stack raster layers

stack_prec <- stack(layer_1, layer_2, layer_3, layer_4, layer_5, layer_6, layer_7, layer_8, layer_9, layer_10,
                    layer_11, layer_12, layer_13, layer_14, layer_15, layer_16, layer_17, layer_18, layer_19, layer_20,
                    layer_21, layer_22, layer_23, layer_24, layer_25, layer_26, layer_27, layer_28, layer_29, layer_30,
                    layer_31, layer_32, layer_33, layer_34, layer_35, layer_36, layer_37, layer_38, layer_39, layer_40,
                    layer_41, layer_42, layer_43, layer_44, layer_45, layer_46, layer_47, layer_48, layer_49, layer_50,
                    layer_51, layer_52, layer_53, layer_54, layer_55, layer_56, layer_57, layer_58, layer_59, layer_60)

CodePudding user response:

It’s always much easier to help when we have sample data we can use. In the future you can use dput(prec) and copy and paste that output for people to use. At the very least some sample data is useful, particularly when you’re using functions that have certain specifications for what the data should look like. Here we generate some data to work with.

library(raster)
#> Loading required package: sp
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following object is masked from 'package:raster':
#> 
#>     shift

set.seed(1)
dat <- 
  data.frame(
    x = rep(seq(-47.8, -47.2, by = 0.2), 241),
    y = -21.2,
    prec = runif(964),
    tstep = rep(1:241, each = 4),
    date = c(rep(as.Date("2015-01-01"), 4), rep(seq(as.Date("2015-01-01"), by = "day", length.out = 60), each = 16))
  )

For your process, it seems a bit more straightforward to group the data rather than break it up. That way you only have to perform the operations on one data set rather than do it many times over. Steps 1 and 2 can be reduced to only a few lines that way. Without thinking too much about optimizing this, I’ve looped over the groups created in the first step to create the raster layers.

raster_layers <- function(dat){
  
  ## some flexibility if there is a differing number of tsteps
  ## it will by default exclude the first tstep as in your example
  min_tstep <- min(dat$tstep)
  max_tstep <- max(dat$tstep)
  breaks <- seq(min_tstep, max_tstep, by = 4)
  
  ## Step 1
  dat$group <- cut(dat$tstep, breaks)
  dat <- dat[!is.na(dat$group), ]
  ## Step 2
  prec <- setDT(dat)[ , list(prec_sum = sum(prec * 1000)), by = list(group, x, y)]
  ## Step 3
  layer <- list()
  group <- unique(prec$group)
  j <- 1
  for (i in group){
    
    raster_dat <- prec[prec$group %in% i , c("x", "y", "prec_sum")]
    ## looks like your plot uses the names for changing labels??
    colnames(raster_dat)[colnames(raster_dat) == "prec_sum"] <- paste0("prec_sum_", j)
    layer[[j]] <- 
      rasterFromXYZ(raster_dat, 
                    res = c(0.20, 0.20), 
                    crs = sp::CRS(" init=epsg:4326"))
    j <- j   1
  }
  ## Step 4
  stack_prec <- stack(unlist(layer))
  
  return(stack_prec)
}

Example

stack_prec <- raster_layers(dat = dat)

stack_prec
#> class      : RasterStack 
#> dimensions : 1, 4, 4, 60  (nrow, ncol, ncell, nlayers)
#> resolution : 0.2, 0.2  (x, y)
#> extent     : -47.9, -47.1, -21.3, -21.1  (xmin, xmax, ymin, ymax)
#> crs        :  init=epsg:4326 
#> names      : prec_sum_1, prec_sum_2, prec_sum_3, prec_sum_4, prec_sum_5, prec_sum_6, prec_sum_7, prec_sum_8, prec_sum_9, prec_sum_10, prec_sum_11, prec_sum_12, prec_sum_13, prec_sum_14, prec_sum_15, ... 
#> min values :  2112.4990,  1124.8232,  2007.5945,  1315.0517,  1729.9294,  1582.8684,  1524.0147,  1098.1529,  2008.5390,   1248.1860,   1680.0199,   1855.4024,    815.4047,   1204.8576,   1416.3943, ... 
#> max values :   2336.186,   2565.158,   2877.219,   2318.115,   3017.609,   2540.536,   2569.019,   2690.884,   2327.706,    2288.046,    3104.792,    2639.530,    2358.953,    2599.245,    2618.676, ...
  • Related