compute missing value based on specific row ID, only on columns with NA-CodePudding

I would like to compute a missing value by way of taking the average of the lead and lag rows, for a specific row. In this example, year 2001.

df = data.frame(year = c(2000, 2001, 2002),
                 A = c(2, NA, 3),
                 B = c(2, 2, 3),
                 C = c(3, NA, 2))

have tried case_when(year == 2001, is.na(.) ~ (lead(.) lag(.) )/ 2) but cant figure out how to change or mutate all of the 2001 rows in place for all instances of NA. Imagine many columns, some filled, some not!

Thoughts? And thanks!

CodePudding user response：

You were going in the right direction. The problem is that you don't want to change the year column but A, B, and C values. In this case, you want to change A and C. See below a fix to the solution that you posted.

Solution

library(dplyr)

df %>% 
  mutate(across(A:C, ~case_when( is.na(.x) ~ (lead(.x)   lag(.x) )/ 2,
                                 TRUE ~ .x)))

Output

  year   A B   C
1 2000 2.0 2 3.0
2 2001 2.5 2 2.5
3 2002 3.0 3 2.0

PS: @akrun provided an elegant solution as well. See their comment to this solution below.

CodePudding user response：

I have a simple solution using transpose from data.table:

df2 <- data.table::transpose(df,make.names = "year",keep.names = "name")

  name 2000 2001 2002
1    A    2   NA    3
2    B    2    2    3
3    C    3   NA    2

You can then use ifelse on the column:

df2$`2001` <-  ifelse(is.na(df2$`2001`),(df2$`2000` df2$`2002`)/2,df2$`2000`)

And transform back

data.table::transpose(df2,keep.names = "year",make.names = "name")

  year   A B   C
1 2000 2.0 2 3.0
2 2001 2.5 2 2.5
3 2002 3.0 3 2.0

Another data.table way, similar to what you did:

df %>% setDT()

df[,lapply(.SD,function(x){
  fifelse(year == 2001 & is.na(x),(lead(x)   lag(x))/2,x)
}),.SDcols = c("A","B","C")]

     A B   C
1: 2.0 2 3.0
2: 2.5 2 2.5
3: 3.0 3 2.0