Home > Software engineering >  Remove NAs and drag other values up
Remove NAs and drag other values up

Time:04-13

I've transposed a matrix and now I have a dataframe that looks like this:

   Placebo  High  Medium     Low
      <dbl> <dbl>   <dbl>   <dbl>
 1  0.0400  NA    NA      NA     
 2  0.04    NA    NA      NA     
 3  0.0200  NA    NA      NA     
 4  0.03    NA    NA      NA     
 5 -0.00500 NA    NA      NA     
 6  0.0300  NA    NA      NA     
 7 NA       -0.04 NA      NA     
 8 NA       NA     0.0100 NA     
 9 NA       NA    NA       0.0100
10 NA       NA    NA       0.04 

I would like to remove the NAs so every value goes up, and I get something like:

   Placebo  High  Medium     Low
      <dbl> <dbl>   <dbl>   <dbl>
 1  0.0400  -0.04   0.0100  0.0100
 2  0.04                    0.04 
 3  0.0200  
 4  0.03     
 5 -0.00500 
 6  0.0300    

It doens't matter if the columns do not have the same length.

CodePudding user response:

You can't end up with a data frame because you have columns of different lengths. You will need to end up with a list, by doing something like:

df  <- read.table(text = "Index Placebo  High  Medium     Low
 1  0.0400  NA    NA      NA     
 2  0.04    NA    NA      NA     
 3  0.0200  NA    NA      NA     
 4  0.03    NA    NA      NA     
 5 -0.00500 NA    NA      NA     
 6  0.0300  NA    NA      NA     
 7 NA       -0.04 NA      NA     
 8 NA       NA     0.0100 NA     
 9 NA       NA    NA       0.0100
10 NA       NA    NA       0.04 ", header = TRUE) 
df["Index"]  <- NULL

lapply(df, function(x) x[!is.na(x)])

# $Placebo
# [1]  0.040  0.040  0.020  0.030 -0.005  0.030

# $High
# [1] -0.04

# $Medium
# [1] 0.01

# $Low
# [1] 0.01 0.04

CodePudding user response:

You can have a dataframe that looks like your desired output. However, it's impractical to have such kind of dataframe.

The "empty" cells are filled with "", therefore, the column type will be character instead of numeric.

do.call(data.frame, 
        lapply(
          sapply(df, function(x) x[!is.na(x)]), 
          function(y) append(y, rep("", max(apply(df, 2, function(x) sum(!is.na(x)))) - length(y))))
        )

  Placebo  High Medium  Low
1    0.04 -0.04   0.01 0.01
2    0.04              0.04
3    0.02                  
4    0.03                  
5  -0.005                  
6    0.03                  

Replacing "" with NA might be better, which preserves the column types as numeric.

do.call(data.frame, 
        lapply(
          sapply(df, function(x) x[!is.na(x)]), 
          function(y) append(y, rep(NA, max(apply(df, 2, function(x) sum(!is.na(x)))) - length(y))))
        )

  Placebo  High Medium  Low
1   0.040 -0.04   0.01 0.01
2   0.040    NA     NA 0.04
3   0.020    NA     NA   NA
4   0.030    NA     NA   NA
5  -0.005    NA     NA   NA
6   0.030    NA     NA   NA

Here's a potentially faster method that only uses one lapply instead of lapply sapply.

do.call(data.frame,
        setNames(lapply(1:ncol(df), function(x)
          append(
            df[, x][!is.na(df[, x])],
            rep(NA, max(apply(df, 2, function(x)
              sum(!is.na(x)))) - length(df[, x][!is.na(df[, x])]))
          )),
          colnames(df)))
  • Related