Home > other >  How to put the average value into particular date?
How to put the average value into particular date?

Time:10-21

I have a data set consist of the average value from particular date. Please see the example of dataset below:

structure(list(startdate = structure(c(14951, 14958, 14965, 14978, 
14985, 14992), class = "Date"), enddate = structure(c(14957, 
14964, 14971, 14985, 14992, 14999), class = "Date"), Conc = c(5.873, 
14.591, 8.854, NA, 20.228, 74.57)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))

I want to produce new data set consist of daily data based on above dataset. Please see the new dataset I desired below:

structure(list(Date = structure(c(14951, 14952, 14953, 14954, 
14955, 14956, 14957, 14958, 14959, 14960, 14961, 14962, 14963, 
14964, 14965, 14966, 14967, 14968, 14969, 14970, 14971, 14972, 
14973, 14974, 14975, 14976, 14977, 14978, 14979, 14980, 14981, 
14982, 14983, 14984, 14985, 14986, 14987, 14988, 14989, 14990, 
14991, 14992, 14993, 14994, 14995, 14996, 14997, 14998), class = "Date"), 
    Conc = c(5.873, 5.873, 5.873, 5.873, 5.873, 5.873, 5.873, 
    14.591, 14.591, 14.591, 14.591, 14.591, 14.591, 14.591, 8.854, 
    8.854, 8.854, 8.854, 8.854, 8.854, 8.854, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, 20.228, 20.228, 20.228, 
    20.228, 20.228, 20.228, 20.228, 74.57, 74.57, 74.57, 74.57, 
    74.57, 74.57, 74.57)), row.names = c(NA, -48L), class = c("tbl_df", 
"tbl", "data.frame"))

I can do the opposite (calculate the average value from particular date), but I don't have any idea how to do the opposite. I have a big data set and if I can do this, it will save my life. Please anyone can help me? Thank you in advance.

CodePudding user response:

We first create a data.frame containing all days between the first startdate and the last enddate. Then we join the averages data.frame and group by each "section" of available values for Conc. Within each of these groups we check if the date is within startdate and enddate and if so replace the value with the first value for Conc.

library(dplyr)
library(tidyr)
data.frame(startdate = seq(min(df1$startdate), max(df1$enddate), by=1)) %>% 
  left_join(df1) %>% 
  group_by(cumsum(!is.na(Conc))) %>% 
  mutate(Conc = ifelse(startdate <= first(enddate), first(Conc), NA)) %>% 
  ungroup() %>% 
  select(Date = startdate, Conc)

Returns:

         Date   Conc
1  2010-12-08  5.873
2  2010-12-09  5.873
3  2010-12-10  5.873
4  2010-12-11  5.873
5  2010-12-12  5.873
6  2010-12-13  5.873
7  2010-12-14  5.873
8  2010-12-15 14.591
9  2010-12-16 14.591
10 2010-12-17 14.591
11 2010-12-18 14.591
12 2010-12-19 14.591
13 2010-12-20 14.591
14 2010-12-21 14.591
15 2010-12-22  8.854
16 2010-12-23  8.854
17 2010-12-24  8.854
18 2010-12-25  8.854
19 2010-12-26  8.854
20 2010-12-27  8.854
21 2010-12-28  8.854
22 2010-12-29     NA
23 2010-12-30     NA
24 2010-12-31     NA
25 2011-01-01     NA
26 2011-01-02     NA
27 2011-01-03     NA
28 2011-01-04     NA
29 2011-01-05     NA
30 2011-01-06     NA
31 2011-01-07     NA
32 2011-01-08     NA
33 2011-01-09     NA
34 2011-01-10     NA
35 2011-01-11 20.228
36 2011-01-12 20.228
37 2011-01-13 20.228
38 2011-01-14 20.228
39 2011-01-15 20.228
40 2011-01-16 20.228
41 2011-01-17 20.228
42 2011-01-18 74.570
43 2011-01-19 74.570
44 2011-01-20 74.570
45 2011-01-21 74.570
46 2011-01-22 74.570
47 2011-01-23 74.570
48 2011-01-24 74.570
49 2011-01-25 74.570

Note: It seems your expected output is missing the last value for 2011-01-25

CodePudding user response:

Here's a tidyverse solution that leverages tidyr::separate_rows() and string manipulation, without the need for any joining or grouping. However, for very long windows of time, where startdate and enddate are very far apart, this solution might prove computationally inefficient — or even strain memory limits — when attempting to cram each sequence of dates into a single string.

Solution

library(tidyverse)


# ...
# Code to generate your first dataframe 'dataset_1'.
# ...


results <- dataset_1 %>%
  # Create a 'Date' column, where each value is a string enumerating each day (as an integer)
  # between 'startdate' and 'enddate':
  #   "14951 14952 14953 14954 14955 14956 14957"
  #   "14958 14959 14960 14961 14962 14963 14964"
  #   ⋮
  rowwise() %>% mutate(Date = paste(startdate:enddate, collapse = " ")) %>%
  # Use 'tidyr::separate_rows()' to pivot each day into its own row.
  separate_rows(Date, sep = " ", convert = TRUE) %>%
  # Format each day as a 'Date' object.
  mutate(Date = structure(Date, class = "Date")) %>%
  # Format the dataset as desired.
  select(Date, Conc)


# View results.
print(results, n = Inf)

Results

Given a dataset_1 like the one you reproduced

dataset_1 <- structure(
  list(
    startdate = structure(
      c(14951, 14958, 14965, 14978, 14985, 14992),
      class = "Date"
    ),
    enddate = structure(
      c(14957, 14964, 14971, 14985, 14992, 14999),
      class = "Date"
    ), 
    Conc = c(5.873, 14.591, 8.854, NA, 20.228, 74.57)
  ),
  row.names = c(NA, -6L),
  class = c("tbl_df", "tbl", "data.frame")
)

this solution should yield the following results:

# A tibble: 45 x 2
   Date        Conc
   <date>     <dbl>
 1 2010-12-08  5.87
 2 2010-12-09  5.87
 3 2010-12-10  5.87
 4 2010-12-11  5.87
 5 2010-12-12  5.87
 6 2010-12-13  5.87
 7 2010-12-14  5.87
 8 2010-12-15 14.6 
 9 2010-12-16 14.6 
10 2010-12-17 14.6 
11 2010-12-18 14.6 
12 2010-12-19 14.6 
13 2010-12-20 14.6 
14 2010-12-21 14.6 
15 2010-12-22  8.85
16 2010-12-23  8.85
17 2010-12-24  8.85
18 2010-12-25  8.85
19 2010-12-26  8.85
20 2010-12-27  8.85
21 2010-12-28  8.85
22 2011-01-04 NA   
23 2011-01-05 NA   
24 2011-01-06 NA   
25 2011-01-07 NA   
26 2011-01-08 NA   
27 2011-01-09 NA   
28 2011-01-10 NA   
29 2011-01-11 NA   
30 2011-01-11 20.2 
31 2011-01-12 20.2 
32 2011-01-13 20.2 
33 2011-01-14 20.2 
34 2011-01-15 20.2 
35 2011-01-16 20.2 
36 2011-01-17 20.2 
37 2011-01-18 20.2 
38 2011-01-18 74.6 
39 2011-01-19 74.6 
40 2011-01-20 74.6 
41 2011-01-21 74.6 
42 2011-01-22 74.6 
43 2011-01-23 74.6 
44 2011-01-24 74.6 
45 2011-01-25 74.6 
  •  Tags:  
  • r
  • Related