I have this simple data frame,
colx <- c(2002-01, 2002-02, 2002-03, 2002-04)
cola <- c(10, 20, NA, 40)
colb <- c(NA, NA, NA, 400)
colc <- c(1000, 2000, 3000, 4000)
df <- data.frame(colx, cola, colb, colc)
df
> df
colx cola colb colc
1 2001 10 NA 1000
2 2000 20 NA 2000
3 1999 NA NA 3000
4 1998 40 400 4000
I want to convert it to long shape,
library(reshape2)
df_long <- melt(df, id.vars = c("colx"))
df_long
> df_long
colx variable value
1 2001 cola 10
2 2000 cola 20
3 1999 cola NA
4 1998 cola 40
5 2001 colb NA
6 2000 colb NA
7 1999 colb NA
8 1998 colb 400
9 2001 colc 1000
10 2000 colc 2000
11 1999 colc 3000
12 1998 colc 4000
However, I want to avoid NA
s if they exist at the beginning of the variable. E.g. in 'colb' the first 3 cases are NA
. I want to avoid those NA
s. However, if the NA
s appear later in the variable, e.g. case 3 in cola
, I don't want to remove it. The expected output should look as follows,
> df_long
colx variable value
1 2001 cola 10
2 2000 cola 20
3 1999 cola NA
4 1998 cola 40
5 1998 colb 400
6 2001 colc 1000
7 2000 colc 2000
8 1999 colc 3000
9 1998 colc 4000
Any suggestions on the way to achive this?
CodePudding user response:
I don't think @Maël's solution works if you have NAs both in the beginning and in the middle of time series.
You can use cumany
function in dplyr
package.
df %>%
pivot_longer(-colx) %>%
group_by(name) %>%
filter(cumany(!is.na(value)))