Home > Net >  replace missing data in multiple column by different specific value
replace missing data in multiple column by different specific value

Time:04-21

I have a huge data frame with several missing value that I need to replace as follow:

Cycle A Cycle B Cycle C .....
na na na
na na na
na na na
-1 na 0
-1 -2 0
na -2 na
na na na
na na 1
0 -1 1
0 -1 na
na na na
na na na
na 0 2
1 0 2
1 na na
na na na

For each column I need to replace the NA's by the next number that appears, to have something like that:

Cycle A Cycle B Cycle C .....
-1 -2 0
-1 -2 0
-1 -2 0
-1 -2 0
-1 -2 0
0 -2 1
0 -1 1
0 -1 1
0 -1 1
0 -1 2
1 0 2
1 0 2
1 0 2
1 0 2
1 1 3
2 1 3

Any idea how to do that? Thank you.

CodePudding user response:

Assume you want to perform replacement in all columns that starts_with Cycle.

The first fill function replaces NA with the next row values. The mutate function replaces NA in the last row to be last non-NA value 1.

library(tidyverse)

df %>% 
  fill(starts_with("Cycle"), .direction = "up") %>% 
  mutate(across(starts_with("Cycle"), ~ replace_na(.x, last(.x[!is.na(.x)])   1)))

   CycleA CycleB CycleC
1      -1     -2      0
2      -1     -2      0
3      -1     -2      0
4      -1     -2      0
5      -1     -2      0
6       0     -2      1
7       0     -1      1
8       0     -1      1
9       0     -1      1
10      0     -1      2
11      1      0      2
12      1      0      2
13      1      0      2
14      1      0      2
15      1      1      3
16      2      1      3

CodePudding user response:

First, convert "na" to NA and type.covert for numeric values.

dat <- replace(dat, dat== 'na', NA) |> type.convert(as.is=TRUE)

Next, I might be wrong, are you looking for this underlying structure?

f <- \(x) {
  rp <- cumsum(c(0, diff(!is.na(x))) > 0)   min(x, na.rm=TRUE)
  nas <- is.na(x)
  x[nas] <- rp[nas]
  x
}

cols <- c("CycleA", "CycleB", "CycleC")  ## select columns

dat[cols] <- lapply(dat[cols], f)
dat
#    CycleA CycleB CycleC
# 1      -1     -2      0
# 2      -1     -2      0
# 3      -1     -2      0
# 4      -1     -2      0
# 5      -1     -2      0
# 6       0     -2      1
# 7       0     -1      1
# 8       0     -1      1
# 9       0     -1      1
# 10      0     -1      2
# 11      1      0      2
# 12      1      0      2
# 13      1      0      2
# 14      1      0      2
# 15      1      1      3
# 16      2      1      3

Data:

dat <- structure(list(CycleA = c("na", "na", "na", "-1", "-1", "na", 
"na", "na", "0", "0", "na", "na", "na", "1", "1", "na"), CycleB = c("na", 
"na", "na", "na", "-2", "-2", "na", "na", "-1", "-1", "na", "na", 
"0", "0", "na", "na"), CycleC = c("na", "na", "na", "0", "0", 
"na", "na", "1", "1", "na", "na", "na", "2", "2", "na", "na")), class = "data.frame", row.names = c(NA, 
-16L))
  • Related