I have a huge data frame with several missing value that I need to replace as follow:
Cycle A | Cycle B | Cycle C | ..... |
---|---|---|---|
na | na | na | |
na | na | na | |
na | na | na | |
-1 | na | 0 | |
-1 | -2 | 0 | |
na | -2 | na | |
na | na | na | |
na | na | 1 | |
0 | -1 | 1 | |
0 | -1 | na | |
na | na | na | |
na | na | na | |
na | 0 | 2 | |
1 | 0 | 2 | |
1 | na | na | |
na | na | na |
For each column I need to replace the NA's by the next number that appears, to have something like that:
Cycle A | Cycle B | Cycle C | ..... |
---|---|---|---|
-1 | -2 | 0 | |
-1 | -2 | 0 | |
-1 | -2 | 0 | |
-1 | -2 | 0 | |
-1 | -2 | 0 | |
0 | -2 | 1 | |
0 | -1 | 1 | |
0 | -1 | 1 | |
0 | -1 | 1 | |
0 | -1 | 2 | |
1 | 0 | 2 | |
1 | 0 | 2 | |
1 | 0 | 2 | |
1 | 0 | 2 | |
1 | 1 | 3 | |
2 | 1 | 3 |
Any idea how to do that? Thank you.
CodePudding user response:
Assume you want to perform replacement in all columns that starts_with
Cycle.
The first fill
function replaces NA
with the next row values. The mutate
function replaces NA
in the last row to be last non-NA value 1.
library(tidyverse)
df %>%
fill(starts_with("Cycle"), .direction = "up") %>%
mutate(across(starts_with("Cycle"), ~ replace_na(.x, last(.x[!is.na(.x)]) 1)))
CycleA CycleB CycleC
1 -1 -2 0
2 -1 -2 0
3 -1 -2 0
4 -1 -2 0
5 -1 -2 0
6 0 -2 1
7 0 -1 1
8 0 -1 1
9 0 -1 1
10 0 -1 2
11 1 0 2
12 1 0 2
13 1 0 2
14 1 0 2
15 1 1 3
16 2 1 3
CodePudding user response:
First, convert "na"
to NA
and type.covert
for numeric values.
dat <- replace(dat, dat== 'na', NA) |> type.convert(as.is=TRUE)
Next, I might be wrong, are you looking for this underlying structure?
f <- \(x) {
rp <- cumsum(c(0, diff(!is.na(x))) > 0) min(x, na.rm=TRUE)
nas <- is.na(x)
x[nas] <- rp[nas]
x
}
cols <- c("CycleA", "CycleB", "CycleC") ## select columns
dat[cols] <- lapply(dat[cols], f)
dat
# CycleA CycleB CycleC
# 1 -1 -2 0
# 2 -1 -2 0
# 3 -1 -2 0
# 4 -1 -2 0
# 5 -1 -2 0
# 6 0 -2 1
# 7 0 -1 1
# 8 0 -1 1
# 9 0 -1 1
# 10 0 -1 2
# 11 1 0 2
# 12 1 0 2
# 13 1 0 2
# 14 1 0 2
# 15 1 1 3
# 16 2 1 3
Data:
dat <- structure(list(CycleA = c("na", "na", "na", "-1", "-1", "na",
"na", "na", "0", "0", "na", "na", "na", "1", "1", "na"), CycleB = c("na",
"na", "na", "na", "-2", "-2", "na", "na", "-1", "-1", "na", "na",
"0", "0", "na", "na"), CycleC = c("na", "na", "na", "0", "0",
"na", "na", "1", "1", "na", "na", "na", "2", "2", "na", "na")), class = "data.frame", row.names = c(NA,
-16L))