Home > database >  Fill NAs in column with either 0 or value n above repeated n times
Fill NAs in column with either 0 or value n above repeated n times

Time:01-18

I have a data frame that looks like:

df <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
                 y = c(NA, 2, NA, NA, NA, 3, NA, NA, NA, 1, NA, NA))

I want it to look like this:

data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
           y = c(0, 2, 2, 0, 0, 3, 3, 3, 0, 1, 0, 0))

#>     x y
#> 1   1 0
#> 2   2 2
#> 3   3 2
#> 4   4 0
#> 5   5 0
#> 6   6 3
#> 7   7 3
#> 8   8 3
#> 9   9 0
#> 10 10 1
#> 11 11 0
#> 12 12 0

I have solved with a while-loop, but was looking for a more R-like solution.

This is the loop solution:

df[is.na(df)] <- 0 # replace all NA with 0

i = 1
while (i < nrow(df)){
  
  if (df$y[i] < 2){ # do nothing if y = 1
    i = i 1
  } else {
    df$y[(i 1):(i df$y[i]-1)] <- df$y[i]
    i = i df$y[i]
  }
}

Bonus question: could it be done within a pipe and for multiple columns (e.g. a column z = c(1, NA, NA, NA, 4, NA, NA, NA, NA, 2, NA, NA))?

CodePudding user response:

You can create an empty vector with numeric, get the value with complete.cases and rep, and get the indices with complete.cases and sequence:

fill_n_repeat <- function(x){
  value = x[complete.cases(x)]
  idx = which(complete.cases(x))
  v = numeric(length(x)) 
  v[sequence(value, idx)] <- rep(value, value)
  v
}

library(dplyr)
df %>% 
  mutate(across(y:z, fill_n_repeat))
    x y z
1   1 0 1
2   2 2 0
3   3 2 0
4   4 0 0
5   5 0 4
6   6 3 4
7   7 3 4
8   8 3 4
9   9 0 0
10 10 1 2
11 11 0 2
12 12 0 0

CodePudding user response:

Group the rows so that each non-NA starts a new group and then for each such group if the first element is NA then output 0's and otherwise output the first element that many times followed by 0's. This uses base R only but if you prefer dplyr replace transform with mutate and all else stays the same.

f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
transform(df, y = ave(y, cumsum(!is.na(y)), FUN = f))

giving (continued below)

    x y
1   1 0
2   2 2
3   3 2
4   4 0
5   5 0
6   6 3
7   7 3
8   8 3
9   9 0
10 10 1
11 11 0
12 12 0

If there were several columns then if ix contains the column numbers to be processed or the column names then using the same f as above then run it over each column to be transformed.

ix <- "y"
f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
f2 <- function(i) ave(df[[i]], cumsum(!is.na(df[[i]])), FUN = f)
replace(df, ix, lapply(ix, f2))

CodePudding user response:

Alternatively, please try below code without any custom function

df2 <- df %>% mutate(z=y) %>% fill(z) %>% group_by(y,z) %>%
 mutate(row=row_number() 1, y=ifelse(z>=row,z,y)) %>% ungroup() %>% 
select(-z,-row)

  • Related