I have a large data frame that looks like this
library(tidyverse)
df <- tibble(fruits=c("apple","banana","ananas"),
col1=c(4,NA,NA),
col2=c(5,NA,10),
col3=c(10,100,5),
col4=c(20,4,1))
df
#> # A tibble: 3 × 5
#> fruits col1 col2 col3 col4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 apple 4 5 10 20
#> 2 banana NA NA 100 4
#> 3 ananas NA 10 5 1
Created on 2022-03-13 by the reprex package (v2.0.1)
When there is an NA value only of the col1, I want to fill the NA with the last non-NA value of the row and return the value to NA.
I want my data to look like this.
#> # A tibble: 3 × 5
#> fruits col1 col2 col3 col4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 apple 4 5 10 20
#> 2 banana 100 NA NA 4
#> 3 ananas 10 NA 5 1
CodePudding user response:
If it is a large data.frame, it may be more efficient to use vectorized solution instead of looping over rows. Get the logical index of elements in 'col1' that are NA
('i1'), use max.col
to return the column index of first
non-NA element from columns 3 to 5 ('j1'), create a row/column index matrix (m1
) with cbind
, assign the 'col1' where there are missing values with the elements extracted from 3 to 5 columns using 'm1' and assign those elements to NA
df1 <- as.data.frame(df)
i1 <- is.na(df1$col1)
j1 <- max.col(!is.na(df1[3:5]), "first")
m1 <- cbind(which(i1), j1[i1])
df1$col1[i1] <- df1[3:5][m1]
df1[3:5][m1] <- NA
-output
> df1
fruits col1 col2 col3 col4
1 apple 4 5 10 20
2 banana 100 NA NA 4
3 ananas 10 NA 5 1