Home > Software design >  Wide to long omitting the early NAs
Wide to long omitting the early NAs

Time:08-16

I have this simple data frame,

colx <- c(2002-01, 2002-02, 2002-03, 2002-04)
cola <- c(10, 20, NA, 40)
colb <- c(NA, NA, NA, 400)
colc <- c(1000, 2000, 3000, 4000)
    
df <- data.frame(colx, cola, colb, colc)
df
> df
  colx cola colb colc
1 2001   10   NA 1000
2 2000   20   NA 2000
3 1999   NA   NA 3000
4 1998   40  400 4000

I want to convert it to long shape,

library(reshape2)
df_long <- melt(df, id.vars = c("colx"))
df_long

> df_long
   colx variable value
1  2001     cola    10
2  2000     cola    20
3  1999     cola    NA
4  1998     cola    40
5  2001     colb    NA
6  2000     colb    NA
7  1999     colb    NA
8  1998     colb   400
9  2001     colc  1000
10 2000     colc  2000
11 1999     colc  3000
12 1998     colc  4000

However, I want to avoid NAs if they exist at the beginning of the variable. E.g. in 'colb' the first 3 cases are NA. I want to avoid those NAs. However, if the NAs appear later in the variable, e.g. case 3 in cola, I don't want to remove it. The expected output should look as follows,

> df_long
  colx variable value
1 2001     cola    10
2 2000     cola    20
3 1999     cola    NA
4 1998     cola    40
5 1998     colb   400
6 2001     colc  1000
7 2000     colc  2000
8 1999     colc  3000
9 1998     colc  4000

Any suggestions on the way to achive this?

CodePudding user response:

I don't think @Maël's solution works if you have NAs both in the beginning and in the middle of time series.

You can use cumany function in dplyr package.

df %>% 
  pivot_longer(-colx) %>% 
  group_by(name) %>%
  filter(cumany(!is.na(value)))
  • Related