Home > Enterprise >  Replace NA values in dataframe with variables in the next column (R)
Replace NA values in dataframe with variables in the next column (R)

Time:01-09

I am new still trying to learn R and I could not find the answers I am looking for in any other thread.

I have a dataset with (for simplicity) 5 columns. Columns 1,2, and4 always have values, but in some rows column 3 doesn't. Below is an example:

Current

A  B  C  D  E
1  1  2  3 
1  2  NA 4  5
1  2  3  4 
1  3  NA 9  7
1  2  NA 5  6

I want to make it so that the NA's are replaced by the value in column D, and then the value in col E is shifted to D, etc.

Desired output:

A  B  C  D  E
1  1  2  3  NA
1  2  4  5  NA
1  2  3  4  NA
1  3  9  7  NA
1  2  5  6  NA

I copied what was on different Stack overflow threads and none achieved what I wanted.

na.omit gets rid of the row. Any help is greatly appreciated.

CodePudding user response:

Data

data <- structure(list(A = c(1L, 1L, 1L, 1L, 1L), B = c(1L, 2L, 2L, 3L, 
2L), C = c(2L, NA, 3L, NA, NA), D = c(3L, 4L, 4L, 9L, 5L), E = c(NA, 
5L, NA, 7L, 6L)), class = "data.frame", row.names = c(NA, -5L
))

Code

library(dplyr)

data %>% 
  mutate(
    aux = C,
    C = if_else(is.na(aux),D,C),
    D = if_else(is.na(aux),E,D),
    E = NA
  ) %>% 
  select(-aux)

Output

  A B C D  E
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA

CodePudding user response:

Replacement operation all in one go:

dat[is.na(dat$C), c("C","D","E")] <- c(dat[is.na(dat$C), c("D","E")], NA)
dat
#  A B C D  E
#1 1 1 2 3 NA
#2 1 2 4 5 NA
#3 1 2 3 4 NA
#4 1 3 9 7 NA
#5 1 2 5 6 NA

Where dat was:

dat <- read.table(text="A  B  C  D  E
1  1  2  3 
1  2  NA 4  5
1  2  3  4 
1  3  NA 9  7
1  2  NA 5  6", fill=TRUE, header=TRUE)

CodePudding user response:

A base R universal approach using order without prior knowledge of NA positions.

setNames(data.frame(t(apply(data, 1, function(x) 
  x[order(is.na(x))]))), colnames(data))
  A B C D  E
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA

Using dplyr

library(dplyr)

t(data) %>% 
  data.frame() %>% 
  mutate(across(everything(), ~ .x[order(is.na(.x))])) %>% 
  t() %>% 
  as_tibble()
# A tibble: 5 × 5
      A     B     C     D     E
  <int> <int> <int> <int> <int>
1     1     1     2     3    NA
2     1     2     4     5    NA
3     1     2     3     4    NA
4     1     3     9     7    NA
5     1     2     5     6    NA

Data

data <- structure(list(A = c(1L, 1L, 1L, 1L, 1L), B = c(1L, 2L, 2L, 3L, 
2L), C = c(2L, NA, 3L, NA, NA), D = c(3L, 4L, 4L, 9L, 5L), E = c(NA, 
5L, NA, 7L, 6L)), class = "data.frame", row.names = c(NA, -5L
))
  • Related