Home > Enterprise >  Filling in missing NA based upon factor level
Filling in missing NA based upon factor level

Time:11-23

I have been trying to convert my data from a vertical configuration to a horizontal configuratoin. Using spread (), I have managed to get it in the following structure:

plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA
7 rows

However, obviously I want to fill in the NAs with the information from the other rows. Then my data would look like

plot , x1, x2, x3, x4, x5, x6, x7
1   0.06011071  0.09756118  -0.1422974NA    0.1809486   0.143701     0.1584451      0.1151581

I have tried multiple approaches but thus far unsuccesfull. Does anyone know how I could achieve this?

CodePudding user response:

This is a solution using dplyr. It assumes that you want to keep the first non-NA value from the top for each plot.

library(dplyr)

dat2 <- dat %>%
  group_by(plot) %>%
  summarize(
    across(
      everything(),
      .fns = ~first(.x[!is.na(.x)])
    )
  ) %>%
  ungroup()
dat2
# # A tibble: 1 x 8
#    plot     x1     x2     x3    x4    x5    x6    x7
#   <int>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

DATA

dat <- read.table(text = "plot  x1  x2  x3  x4  x5  x6  x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA",
                  header = TRUE)

CodePudding user response:

Or:

library(tidyverse)

d <- read.table(text = "plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA", header = TRUE)

d %>%
  summarise(across(starts_with("x"), sum, na.rm = TRUE))

#x1.        x2.        x3.       x4.      x5.       x6.        x7
#1 0.06011071 0.09756118 -0.1422974 0.1809486 0.143701 0.1584451 0.1151581

CodePudding user response:

We can also use na.omit with dplyr:

df %>% group_by(plot) %>% summarise(across(matches('x\\d '), na.omit))

CodePudding user response:

something like this?

df <- data.frame(x1 = c(1,NA,NA,NA,NA),
                 x2 = c(NA,2,NA,NA,NA),
                 x3 = c(NA,NA,3,NA,NA),
                 x4 = c(NA,NA,NA,4,NA),
                 x5 = c(NA,NA,NA,NA,5))
> df
  x1 x2 x3 x4 x5
1  1 NA NA NA NA
2 NA  2 NA NA NA
3 NA NA  3 NA NA
4 NA NA NA  4 NA
5 NA NA NA NA  5

for (i in 1:ncol(df)){
  df[,i][is.na(df[,i])] <- df[,i][!is.na(df[,i])]
}

> df
  x1 x2 x3 x4 x5
1  1  2  3  4  5
2  1  2  3  4  5
3  1  2  3  4  5
4  1  2  3  4  5
5  1  2  3  4  5

CodePudding user response:

Maybe:

library(tidyverse)

df <- read.table(text = "plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA", header = TRUE)


#if every column has only one value and the rest are NA's

tibble(plot = 1, map_dfc(df[, -1], ~.[!is.na(.)]))
#> # A tibble: 1 × 8
#>    plot    x1.    x2.    x3.   x4.   x5.   x6.    x7
#>   <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

#or gathering and spreading approach

pivot_longer(df,cols = matches('^x\\d'), names_to = 'x') %>% 
  filter(!is.na(value)) %>% 
  arrange(x) %>% 
  pivot_wider(names_from = 'x', values_from = 'value')
#> # A tibble: 1 × 8
#>   plot.    x1.    x2.    x3.   x4.   x5.   x6.    x7
#>   <int>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

Created on 2021-11-22 by the reprex package (v2.0.1)

  • Related