I would like to ask how can I add missing month to a specific column in a dataframe.
starttime
1 2016-02-26
2 2016-04-12
3 2016-04-22
4 2016-08-04
5 2016-09-15
6 2016-09-16
7 2016-09-20
8 2016-09-22
9 2017-06-02
I'm wishing to transform this into 2016-02-26 -> 2016-03-01 -> 2016-04-12 -> 2016-04-22 ->2015-05-01 (With an NA Value to each of the missing date frequencies).
CodePudding user response:
Surprisingly challenging. what you want is to include missing months into your vector. For this you have to compute all months between the first and last dates, then check which ones your data already contains.
library(lubridate)
mydates <- as_date(c("2001-05-04", "2001-05-30", "2001-07-15", "2001-10-20"))
n <- length(mydates)
mymonths <- month(mydates)
myyears <- year(mydates)
df <- data.frame(mydates, myyears, mymonths)
firstdate <- floor_date(min(mydates), unit="month")
lastdate <- floor_date(max(mydates), unit = "month")
nbmonths <- as.numeric(round((lastdate - firstdate)/(365.25/12)))
fulldates <- firstdate%m % months(0:nbmonths)
fullmonths <- month(fulldates)
fullyears <- year(fulldates)
fullyearmonths <- paste(fullyears, fullmonths, sep='-')
toadd <- as_date(ym(fullyearmonths[!fullyearmonths %in% myyearmonths]))
result <- c(mydates, toadd)
result <- result[order(result)]
[1] "2001-05-04" "2001-05-30" "2001-06-01" "2001-07-15" "2001-08-01" "2001-09-01" "2001-10-20"
CodePudding user response:
You may merge
with an auxiliary data frame containing the missing dates. "Date"
format is required. In a function f
we use the range
of the dates, use the substr
ings several times (i.e. cut away the days) c
oncatenate them with the original dates and sort the thing.
f <- \(x) {
sq <- do.call(seq.Date, c(as.list(as.Date(paste0(substr(range(as.Date(x)), 1, 7), '-01'))), 'month'))
sort(c(as.Date(x), sq[substr(sq, 1, 7) %in% substr(x, 1, 7)]))
}
merge(transform(df, starttime=as.Date(starttime)), data.frame(starttime=f(df$starttime)), all=TRUE)
# starttime X
# 1 2016-02-01 NA
# 2 2016-02-26 0
# 3 2016-04-01 NA
# 4 2016-04-12 0
# 5 2016-04-22 0
# 6 2016-08-01 NA
# 7 2016-08-04 0
# 8 2016-09-01 NA
# 9 2016-09-15 0
# 10 2016-09-16 0
# 11 2016-09-20 0
# 12 2016-09-22 0
# 13 2017-06-01 NA
# 14 2017-06-02 0
Data:
x <- c('2016-02-26', '2016-04-12', '2016-04-22', '2016-08-04', '2016-09-15', '2016-09-16', '2016-09-20', '2016-09-22', '2017-06-02')
df <- data.frame(starttime=x, X=0)
CodePudding user response:
We could use seperate the dates and use complete
:
library(tidyverse)
df <- tibble(starttime = as_date(c("2016-02-26",
"2016-04-12",
"2016-04-22",
"2016-08-04",
"2016-09-15",
"2016-09-16",
"2016-09-20",
"2016-09-22",
"2017-06-02")))
df |>
mutate(temp_day = day(starttime),
temp_month = month(starttime),
temp_year = year(starttime)) |>
complete(temp_month = 1:12,
temp_year = min(temp_year):max(temp_year)) |>
mutate(temp_day = ifelse(is.na(temp_day), 1, temp_day),
starttime = as_date(paste(temp_year, temp_month, temp_day, sep = '-'))) |>
select(-starts_with("temp")) |>
arrange(starttime)
Output:
# A tibble: 28 × 1
starttime
<date>
1 2016-01-01
2 2016-02-26
3 2016-03-01
4 2016-04-12
5 2016-04-22
6 2016-05-01
7 2016-06-01
8 2016-07-01
9 2016-08-04
10 2016-09-15
11 2016-09-16
12 2016-09-20
13 2016-09-22
14 2016-10-01
15 2016-11-01
16 2016-12-01
17 2017-01-01
18 2017-02-01
19 2017-03-01
20 2017-04-01
21 2017-05-01
22 2017-06-02
23 2017-07-01
24 2017-08-01
25 2017-09-01
26 2017-10-01
27 2017-11-01
28 2017-12-01