The code posted at the bottom does a nice job of filling in a dataframe, using package tidyr
, so that all ID's end up with the same number of periods, in the case of period defined as number of months ("Period_1" in the below code). Base dataframe testDF
has ID of 1 with 5 periods, and ID of 50 and 60 with only 3 periods each. The tidyr
code creates additional periods ("Period_1") for ID of 50 and 60 so they too have 5 Period_1´s. The code copies down the "Bal" and "State" fields so that all ID end up with the same number of Period_1, which is correct.
However, how would I extend the calendar month expression of "Period_2" in the same manner, as illustrated immediately below?
Code:
library(tidyr)
testDF <-
data.frame(
ID = as.numeric(c(rep(1,5),rep(50,3),rep(60,3))),
Period_1 = as.numeric(c(1:5,1:3,1:3)),
Period_2 = c("2012-06","2012-07","2012-08","2012-09","2012-10","2013-06","2013-07","2013-08","2012-01","2012-02","2012-03"),
Bal = as.numeric(c(rep(10,5),21:23,36:34)),
State = c("XX","AA","BB","CC","XX","AA","BB","CC","SS","XX","AA")
)
testDFextend <-
testDF %>%
tidyr::complete(ID, nesting(Period_1)) %>%
tidyr::fill(Bal, State, .direction = "down")
testDFextend
CodePudding user response:
by
ID you can strsplit
the date, and take the elements to create a new data.frame
to merge
with.
ml <- max(with(testDF, tapply(ID, ID, length))) ## get max. period length
by(testDF, testDF$ID, \(x) {
sp <- strsplit(x$Period_2, '-')
s <- as.numeric(sp[[1]][[2]])
if (ml != nrow(x))
merge(x, data.frame(Period_2=paste0(sp[[1]][[1]], '-', sprintf('d', (s nrow(x)):(s ml - 1))),
Period_1=(nrow(x) 1):ml,
ID=x$ID[nrow(x)], Bal=x$Bal[nrow(x)], State=x$State[nrow(x)]), all=TRUE)
else x
}) |> c(make.row.names=FALSE) |> do.call(what=rbind)
# ID Period_1 Period_2 Bal State
# 1 1 1 2012-06 10 XX
# 2 1 2 2012-07 10 AA
# 3 1 3 2012-08 10 BB
# 4 1 4 2012-09 10 CC
# 5 1 5 2012-10 10 XX
# 6 50 1 2013-06 21 AA
# 7 50 2 2013-07 22 BB
# 8 50 3 2013-08 23 CC
# 9 50 4 2013-09 23 CC
# 10 50 5 2013-10 23 CC
# 11 60 1 2012-01 36 SS
# 12 60 2 2012-02 35 XX
# 13 60 3 2012-03 34 AA
# 14 60 4 2012-04 34 AA
# 15 60 5 2012-05 34 AA
Edit
For older R versions (although it's recommended to always use update software), do:
do.call(c(by(testDF, testDF$ID, function(x) {
sp <- strsplit(x$Period_2, '-')
s <- as.numeric(sp[[1]][[2]])
if (ml != nrow(x))
merge(x, data.frame(Period_2=paste0(sp[[1]][[1]], '-', sprintf('d', (s nrow(x)):(s ml - 1))),
Period_1=(nrow(x) 1):ml,
ID=x$ID[nrow(x)], Bal=x$Bal[nrow(x)], State=x$State[nrow(x)]), all=TRUE)
else x
}), make.row.names=FALSE), what=rbind)
CodePudding user response:
A tidyverse
solution based on zoo::na.spline
. Note that it does not handle year changes. It's harder than I thought, especially because zoo::na.spline
does not seem to work on yearmon
format.
library(tidyr)
library(dplyr)
testDFextend %>%
separate(Period_2, into = c("year", "month"), convert = TRUE) %>%
fill(year) %>%
group_by(ID) %>%
mutate(month = sprintf("d", zoo::na.spline(month))) %>%
unite("Period_2", year, month, sep = "-")
output
ID Period_1 Period_2 Bal State
<dbl> <dbl> <chr> <dbl> <chr>
1 1 1 2012-06 10 XX
2 1 2 2012-07 10 AA
3 1 3 2012-08 10 BB
4 1 4 2012-09 10 CC
5 1 5 2012-10 10 XX
6 50 1 2013-06 21 AA
7 50 2 2013-07 22 BB
8 50 3 2013-08 23 CC
9 50 4 2013-09 23 CC
10 50 5 2013-10 23 CC
11 60 1 2012-01 36 SS
12 60 2 2012-02 35 XX
13 60 3 2012-03 34 AA
14 60 4 2012-04 34 AA
15 60 5 2012-05 34 AA
CodePudding user response:
I think the nicest way to do this is to make use of the padr
package, which is built to pad data.frame
s where there are missing/incomplete columns.
This uses grouping and cur_data()
to make the correct date sequence in Period_2
.
library(dplyr)
library(tidyr)
library(padr)
n_periods <- 5
testDF %>%
pad_int(end_val = n_periods , by = "Period_1", group = "ID") %>%
group_by(ID) %>%
mutate(Period_2 = as.Date(paste0(Period_2, "-01"))) %>%
mutate(Period_2 = seq(cur_data()$Period_2[1], by = "months", length.out =
n_periods) %>% format("%Y-%m")) %>%
fill(Bal, State) %>%
ungroup() %>%
select(ID, Period_1, Period_2, Bal, State)
ID Period_1 Period_2 Bal State
<dbl> <dbl> <chr> <dbl> <chr>
1 1 1 2012-06 10 XX
2 1 2 2012-07 10 AA
3 1 3 2012-08 10 BB
4 1 4 2012-09 10 CC
5 1 5 2012-10 10 XX
6 50 1 2013-06 21 AA
7 50 2 2013-07 22 BB
8 50 3 2013-08 23 CC
9 50 4 2013-09 23 CC
10 50 5 2013-10 23 CC
11 60 1 2012-01 36 SS
12 60 2 2012-02 35 XX
13 60 3 2012-03 34 AA
14 60 4 2012-04 34 AA
15 60 5 2012-05 34 AA
Note that this will handle cases when the year rolls over to the next year during Period_2
.
Finally, you could adjust n_periods
if you needed a different number of periods (or use a function to figure it out automatically, like jay.sf's answer).