Home > Software engineering >  Aggregating data from daily to monthly, where each column contains data for a day, and each row repr
Aggregating data from daily to monthly, where each column contains data for a day, and each row repr

Time:06-02

I have a large dataframe where the first two columns are longitude and latitude values (making each row represent a point of a specific longitude and latitude), and the succeeding columns are daily data for a long period of time (couple of years). As such, the dataframe is very large.

It looks like this: large dataframe

Is there a way to transform this dataframe to a monthly one?

  • First I want to get the total of each month per year,
  • Then I want to get the average of the totals of each month, eventually getting a dataframe with 14 columns: lon, lat, january, february, ..., december.

Is this possible?

I was able to do this when my data only spanned one year using the following code:

mymonthlydata$`1`=apply(mydailydata[3:33],1,sum)

where I did this for each month (columns 3 to 33 are the days of january for example), but now that I have multiple years, I am not quite sure what to do. Any help would be appreciated, even just pointing me to the right direction. Thank you.

Some sample data that mimics OP's screenshot that can be used

dt <- data.table::data.table(lon = (0:9) abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]

str(dt)
# Classes ‘data.table’ and 'data.frame':    10 obs. of  2559 variables:
#  $ lon       : num  0.702 1.77 3.55 4.294 4.712 ...
#  $ lat       : num  -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5
#  $ 2015-01-01: num  0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
#  $ 2015-01-02: num  0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
#  $ 2015-01-03: num  0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...

CodePudding user response:

sample data that mimics OP's screenshot

dt <- data.table(lon = (0:9) abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]

data.table solution

library(data.table)
library(lubridate)

# setDT(dt) needed only if you have a data.frame (my sample is already a data.table)

dt_long <- melt(dt, id.vars = c("lon", "lat"), variable.name = "date")
dt_long[, `:=` (year = year(date), month = month(date, label = T, abbr = F)), by = date]

dt_long_sum <- dt_long[, .(sum = sum(value)), by = .(lon, lat, year, month)]
dt_long_mean <- dt_long_sum[, .(mean = mean(sum)), by = .(lon, lat, month)]

dcast(dt_long_mean, lon   lat ~ month, value.var = "mean")

results

#        lon   lat        January       February          March          April            May           June           July         August      September        October
#  1: 0.7016 -89.5 0.000000150444 0.000000137272 0.000000150444 0.000000145591 0.000000150444 0.000000145591 0.000000150444 0.000000150444 0.000000145591 0.000000150444
#  2: 1.7702 -89.5 0.000000147013 0.000000134141 0.000000147013 0.000000142270 0.000000147013 0.000000142270 0.000000147013 0.000000147013 0.000000142270 0.000000147013
#  3: 3.5504 -89.5 0.000000313048 0.000000285639 0.000000313048 0.000000302950 0.000000313048 0.000000302950 0.000000313048 0.000000313048 0.000000302950 0.000000313048
#  4: 4.2941 -89.5 0.000000318180 0.000000290321 0.000000318180 0.000000307916 0.000000318180 0.000000307916 0.000000318180 0.000000318180 0.000000307916 0.000000318180
#  5: 4.7121 -89.5 0.000000087354 0.000000079705 0.000000087354 0.000000084536 0.000000087354 0.000000084536 0.000000087354 0.000000087354 0.000000084536 0.000000087354
#  6: 5.6789 -89.5 0.000000202693 0.000000184945 0.000000202693 0.000000196154 0.000000202693 0.000000196154 0.000000202693 0.000000202693 0.000000196154 0.000000202693
#  7: 6.4409 -89.5 0.000000168332 0.000000153593 0.000000168332 0.000000162902 0.000000168332 0.000000162902 0.000000168332 0.000000168332 0.000000162902 0.000000168332
#  8: 8.3095 -89.5 0.000000037445 0.000000034166 0.000000037445 0.000000036237 0.000000037445 0.000000036237 0.000000037445 0.000000037445 0.000000036237 0.000000037445
#  9: 8.8061 -89.5 0.000000051242 0.000000046756 0.000000051242 0.000000049589 0.000000051242 0.000000049589 0.000000051242 0.000000051242 0.000000049589 0.000000051242
# 10: 9.2667 -89.5 0.000000338410 0.000000308779 0.000000338410 0.000000327493 0.000000338410 0.000000327493 0.000000338410 0.000000338410 0.000000327493 0.000000338410
#           November       December
#  1: 0.000000145591 0.000000150444
#  2: 0.000000142270 0.000000147013
#  3: 0.000000302950 0.000000313048
#  4: 0.000000307916 0.000000318180
#  5: 0.000000084536 0.000000087354
#  6: 0.000000196154 0.000000202693
#  7: 0.000000162902 0.000000168332
#  8: 0.000000036237 0.000000037445
#  9: 0.000000049589 0.000000051242
# 10: 0.000000327493 0.000000338410
  • Related