I have a time Series DataFrame:
[https://www.dropbox.com/s/elaxfuvqyip1eq8/SampleDF.csv?dl=0][1]
My intention is to divide this DataFrame into different seasons according to:
- winter: Dec Jan Feb
- Pre-monsoon: Mar Apr May Jun15 (i.e. till 15th of June)
- Monsoon: 15Jun Jul Aug Sep (i.e. from 15th of June)
- Post-monsoon: Oct Nov.
I tried using openair
package function
selectByDate()
But no luck yet. Being novice in R. Any help would be highly appreciated.
Thanks!
CodePudding user response:
Please see the lubridate
package which makes working with date/time a bit easier.
For your problem, I guess you can use sapply
:
df["season"] = sapply(df["date"], assign_season)
where, assign_season
:
assign_season <- function(date){
# return a season based on date
}
once you have seasons, then you can divide the dataframe easily:
winter = subset(df, season == "winter")
# and so on
Sorry, I have to rush now, but can come back and finish this, if someone else hasn't answered already.
EDIT:
So, R does have a built in function cut
, that can work on dates and split a vector based on date ranges.
For your data, I did this like so:
library(lubridate)
library(dplyr)
df = read.csv('SampleDF.csv')
## reformat date into POSIXct
df <- df %>%
mutate(date_reformat = as.POSIXct(date(mdy_hm(date))))
## define breaks & labels
breaks = c("2014-12-01", "2015-03-01", "2015-06-15", "2015-10-01", "2015-12-01", "2016-03-01", "2016-06-15", "2016-10-01", "2016-12-01", "2017-03-01")
labels = c("winter", "pre_monsoon", "monsoon", "post_monsoon", "winter", "pre_monsoon", "monsoon", "post_monsoon", "winter")
df["season"] = cut(df$date_reformat, breaks=as.POSIXct(breaks), labels=labels)
splits = list()
for (s in c("winter", "pre_monsoon", "monsoon", "post_monsoon")){
splits[[s]] = subset(df, season == s)[c("date", "value")]
}
Now, the splits list should have all the data you need