Home > OS >  Replacing NA from a specific column with latest non-NA value from that row in R
Replacing NA from a specific column with latest non-NA value from that row in R

Time:03-14

I have a large data frame that looks like this

library(tidyverse)
df <- tibble(fruits=c("apple","banana","ananas"),
             col1=c(4,NA,NA),
             col2=c(5,NA,10),
             col3=c(10,100,5),
             col4=c(20,4,1))
df
#> # A tibble: 3 × 5
#>   fruits  col1  col2  col3  col4
#>   <chr>  <dbl> <dbl> <dbl> <dbl>
#> 1 apple      4     5    10    20
#> 2 banana    NA    NA   100     4
#> 3 ananas    NA    10     5     1

Created on 2022-03-13 by the reprex package (v2.0.1)

When there is an NA value only of the col1, I want to fill the NA with the last non-NA value of the row and return the value to NA.

I want my data to look like this.

#> # A tibble: 3 × 5
#>   fruits  col1  col2  col3  col4
#>   <chr>  <dbl> <dbl> <dbl> <dbl>
#> 1 apple      4     5    10    20
#> 2 banana    100   NA    NA     4
#> 3 ananas    10    NA     5     1

CodePudding user response:

If it is a large data.frame, it may be more efficient to use vectorized solution instead of looping over rows. Get the logical index of elements in 'col1' that are NA ('i1'), use max.col to return the column index of first non-NA element from columns 3 to 5 ('j1'), create a row/column index matrix (m1) with cbind, assign the 'col1' where there are missing values with the elements extracted from 3 to 5 columns using 'm1' and assign those elements to NA

df1 <- as.data.frame(df)
i1 <- is.na(df1$col1)
j1 <- max.col(!is.na(df1[3:5]), "first")
m1 <- cbind(which(i1), j1[i1])
df1$col1[i1] <- df1[3:5][m1]
df1[3:5][m1] <- NA

-output

> df1
  fruits col1 col2 col3 col4
1  apple    4    5   10   20
2 banana  100   NA   NA    4
3 ananas   10   NA    5    1
  • Related