I have a dataframe df as follows:
Date Value
1-Jun-12 5
2-Jun-12 10
3-Jun-12 8
4-Jun-12 15
2-Jul-12 12
3-Jul-12 6
4-Jul-12 14
1-Aug-12 20
2-Aug-12 10
My output should be:
Date Value mon_diff
1-Jun-12 5 7
2-Jun-12 10
3-Jun-12 8
4-Jun-12 15
2-Jul-12 12 8
3-Jul-12 6
4-Jul-12 14
1-Aug-12 20 ...
2-Aug-12 10
Actually I have to take the next months first value and subtract it from the first value that is 12-5 = 7 then again next months first value to be subtracted from current month value that is 20-12 = 8. Please understand there is no fixed number of rows for date as different months have different number of days. Please help.
CodePudding user response:
Making the approach more robust so that it can be implemented even when there will be entries for multiple years.
library(tidyverse)
library(lubridate)
df %>%
mutate(
Date = dmy(Date),
month_year=paste0(month(Date),'_',year(Date))) %>%
group_by(month_year) %>%
filter(Date==min(Date)) %>%
ungroup() %>%
mutate(mon_diff=lead(Value)-Value) %>%
select(-month_year) %>%
right_join(df %>% mutate(Date=dmy(Date)), by=c("Date", "Value")) %>%
arrange(Date)-> output_df
Output:
Date Value mon_diff
<date> <int> <int>
1 2012-06-01 5 7
2 2012-06-02 10 NA
3 2012-06-03 8 NA
4 2012-06-04 15 NA
5 2012-07-02 12 8
6 2012-07-03 6 NA
7 2012-07-04 14 NA
8 2012-08-01 20 NA
9 2012-08-02 10 NA
Data:
read.table(text='Date Value
1-Jun-12 5
2-Jun-12 10
3-Jun-12 8
4-Jun-12 15
2-Jul-12 12
3-Jul-12 6
4-Jul-12 14
1-Aug-12 20
2-Aug-12 10',header=T)-> df
CodePudding user response:
Using the data shown reproducibly in the Note at the end, convert the Date to yearmon (year and month with no day) giving ym
and then for each row find the first element with the same year month and the first row with the next year month. Note that yearmon class represents year and month internally as year fraction where fraction = 0, 1/12, ..., 11/12 so the next month is found by adding 1/12. Evaluate Value
for those rows taking the difference. Finally NA out diff
for those rows with duplicated year and month values. If you are using tidyverse then use the same code but with mutate
replacing transform
.
library(zoo)
ym <- as.yearmon(DF$Date, format = "%d-%b-%y")
DF |>
transform(diff = Value[match(ym 1/12, ym)] - Value[match(ym, ym)]) |>
transform(diff = ifelse(duplicated(diff), NA, diff))
giving:
Date Value diff
1 1-Jun-12 5 7
2 2-Jun-12 10 NA
3 3-Jun-12 8 NA
4 4-Jun-12 15 NA
5 2-Jul-12 12 8
6 3-Jul-12 6 NA
7 4-Jul-12 14 NA
8 1-Aug-12 20 NA
9 2-Aug-12 10 NA
Note
Lines <- "Date Value
1-Jun-12 5
2-Jun-12 10
3-Jun-12 8
4-Jun-12 15
2-Jul-12 12
3-Jul-12 6
4-Jul-12 14
1-Aug-12 20
2-Aug-12 10"
DF <- read.table(text = Lines, header = TRUE)