Home > Software engineering >  how to select every months first value and perform calculations in r
how to select every months first value and perform calculations in r

Time:10-19

I have a dataframe df as follows:

   Date       Value
  1-Jun-12     5
  2-Jun-12     10
  3-Jun-12     8
  4-Jun-12     15
  2-Jul-12     12
  3-Jul-12     6 
  4-Jul-12     14
  1-Aug-12     20
  2-Aug-12     10

My output should be:

    Date       Value    mon_diff
  1-Jun-12     5           7 
  2-Jun-12     10
  3-Jun-12     8
  4-Jun-12     15          
  2-Jul-12     12          8
  3-Jul-12     6 
  4-Jul-12     14
  1-Aug-12     20          ...
  2-Aug-12     10

Actually I have to take the next months first value and subtract it from the first value that is 12-5 = 7 then again next months first value to be subtracted from current month value that is 20-12 = 8. Please understand there is no fixed number of rows for date as different months have different number of days. Please help.

CodePudding user response:

Making the approach more robust so that it can be implemented even when there will be entries for multiple years.

library(tidyverse)
library(lubridate)

df %>%
  mutate(
    Date = dmy(Date),
    month_year=paste0(month(Date),'_',year(Date))) %>%
  group_by(month_year) %>%
  filter(Date==min(Date)) %>%
  ungroup() %>%
  mutate(mon_diff=lead(Value)-Value) %>%
  select(-month_year) %>%
  right_join(df %>% mutate(Date=dmy(Date)), by=c("Date", "Value")) %>%
  arrange(Date)-> output_df

Output:

Date       Value mon_diff
  <date>     <int>    <int>
1 2012-06-01     5        7
2 2012-06-02    10       NA
3 2012-06-03     8       NA
4 2012-06-04    15       NA
5 2012-07-02    12        8
6 2012-07-03     6       NA
7 2012-07-04    14       NA
8 2012-08-01    20       NA
9 2012-08-02    10       NA

Data:

read.table(text='Date       Value
1-Jun-12     5
2-Jun-12     10
3-Jun-12     8
4-Jun-12     15
2-Jul-12     12
3-Jul-12     6 
4-Jul-12     14
1-Aug-12     20
2-Aug-12     10',header=T)-> df

CodePudding user response:

Using the data shown reproducibly in the Note at the end, convert the Date to yearmon (year and month with no day) giving ym and then for each row find the first element with the same year month and the first row with the next year month. Note that yearmon class represents year and month internally as year fraction where fraction = 0, 1/12, ..., 11/12 so the next month is found by adding 1/12. Evaluate Value for those rows taking the difference. Finally NA out diff for those rows with duplicated year and month values. If you are using tidyverse then use the same code but with mutate replacing transform.

library(zoo)

ym <- as.yearmon(DF$Date, format = "%d-%b-%y")
DF |>
  transform(diff = Value[match(ym   1/12, ym)] - Value[match(ym, ym)]) |>
  transform(diff = ifelse(duplicated(diff), NA, diff))

giving:

      Date Value diff
1 1-Jun-12     5    7
2 2-Jun-12    10   NA
3 3-Jun-12     8   NA
4 4-Jun-12    15   NA
5 2-Jul-12    12    8
6 3-Jul-12     6   NA
7 4-Jul-12    14   NA
8 1-Aug-12    20   NA
9 2-Aug-12    10   NA

Note

Lines <- "Date       Value
  1-Jun-12     5
  2-Jun-12     10
  3-Jun-12     8
  4-Jun-12     15
  2-Jul-12     12
  3-Jul-12     6 
  4-Jul-12     14
  1-Aug-12     20
  2-Aug-12     10"
DF <- read.table(text = Lines,  header = TRUE)
  • Related