Home > front end >  How to move up all columns in a dataframe in R?
How to move up all columns in a dataframe in R?

Time:11-06

I have a dataframe that looks like this...

C_array D_array E_array
20.000    NA        NA
0.000     NA        NA
0.000     NA        NA
  NA      17.000    NA
  NA      17.000    NA
  NA      21.000    NA
  NA      49.000    NA
  NA      52.000    NA
  NA      NA      31.000
  NA      NA      31.000
  NA      NA      32.000
  NA      NA      32.000
  NA      NA      34.000
  NA      NA      34.000
  NA      NA      34.000
  NA      NA      34.000
  NA      NA      34.000

How can I remove the leading NA values? I want it to look like this...

C_array D_array E_array
20.000   17.000  31.000
0.000    17.000  31.000
0.000    21.000  32.000
  NA     49.000  32.000
  NA     52.000  34.000
  NA      NA     34.000
  NA      NA     34.000
  NA      NA     34.000
  NA      NA     34.000

CodePudding user response:

We could rearrange the NAs in each column so that the non-NA elements will be ordered before the NA and then subset the rows by removing rows having all NA values

df2 <- df1
df2[] <- lapply(df2, function(x) x[order(is.na(x))])
df2[rowSums(is.na(df2)) < ncol(df2),]

-output

 C_array D_array E_array
1      20      17      31
2       0      17      31
3       0      21      32
4      NA      49      32
5      NA      52      34
6      NA      NA      34
7      NA      NA      34
8      NA      NA      34
9      NA      NA      34

or do this in tidyverse

library(dplyr)
df1 %>%
    mutate(across(everything(), ~ .[order(is.na(.))])) %>% 
    filter(!if_all(everything(), is.na))
  C_array D_array E_array
1      20      17      31
2       0      17      31
3       0      21      32
4      NA      49      32
5      NA      52      34
6      NA      NA      34
7      NA      NA      34
8      NA      NA      34
9      NA      NA      34

data

df1 <- structure(list(C_array = c(20, 0, 0, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), D_array = c(NA, NA, NA, 17, 
17, 21, 49, 52, NA, NA, NA, NA, NA, NA, NA, NA, NA), E_array = c(NA, 
NA, NA, NA, NA, NA, NA, NA, 31, 31, 32, 32, 34, 34, 34, 34, 34
)), class = "data.frame", row.names = c(NA, -17L))

CodePudding user response:

Here's an option, but I'm a bit worried about the format, if being on the same row doesn't mean anything you might want to keep your result as a list and not a table.

res_list <- lapply(df, \(x) x[!is.na(x)]) # you might stop here
as.data.frame(lapply(res_list, `length<-`, max(lengths(res))))
#>   C_array D_array E_array
#> 1      20      17      31
#> 2       0      17      31
#> 3       0      21      32
#> 4      NA      49      32
#> 5      NA      52      34
#> 6      NA      NA      34
#> 7      NA      NA      34
#> 8      NA      NA      34
#> 9      NA      NA      34

CodePudding user response:

Are these real NA values or text strings stating like "NA" If it is the first scenario:

df <- df[!is.na(df$E_array),]

otherwise

df <- df[df$E_array != "NA" ,]

CodePudding user response:

You can use na.omit(YOURDATA) TO REMOVE na VALUES.

CodePudding user response:

df <- structure(list(C_array = c(20L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                     D_array = c(NA, NA, NA, 17L, 17L, 21L, 49L, 52L, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                     E_array = c(NA, NA, NA, NA, NA, NA, NA, NA, 31L, 31L, 32L, 32L, 34L, 34L, 34L, 34L, 34L)),
                class = "data.frame", row.names = c(NA, -17L))

l <- sapply(df, function(x) x[!is.na(x)])
res <- sapply(l, function(x){length(x) <- max(lengths(l)); x})
res
#>       C_array D_array E_array
#>  [1,]      20      17      31
#>  [2,]       0      17      31
#>  [3,]       0      21      32
#>  [4,]      NA      49      32
#>  [5,]      NA      52      34
#>  [6,]      NA      NA      34
#>  [7,]      NA      NA      34
#>  [8,]      NA      NA      34
#>  [9,]      NA      NA      34

Created on 2021-11-05 by the reprex package (v2.0.1)

CodePudding user response:

stack, na.omit, and unstack over columns and adapting lengths as in Moody's answer.

r <- stack(dat) |> na.omit() |> unstack() |>
  {\(x) lapply(x, `length<-`, max(lengths(x)))}() |>
  as.data.frame()
r
#   C_array D_array E_array
# 1      20      17      31
# 2       0      17      31
# 3       0      21      32
# 4      NA      49      32
# 5      NA      52      34
# 6      NA      NA      34
# 7      NA      NA      34
# 8      NA      NA      34
# 9      NA      NA      34

This keeps the result free of the "na.action" attribute.

str(r)
# 'data.frame': 9 obs. of  3 variables:
# $ C_array: num  20 0 0 NA NA NA NA NA NA
# $ D_array: num  17 17 21 49 52 NA NA NA NA
# $ E_array: num  31 31 32 32 34 34 34 34 34

Note:

R.version.string
# [1] "R version 4.1.2 (2021-11-01)"

Data:

dat <- structure(list(C_array = c(20, 0, 0, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), D_array = c(NA, NA, NA, 17, 
17, 21, 49, 52, NA, NA, NA, NA, NA, NA, NA, NA, NA), E_array = c(NA, 
NA, NA, NA, NA, NA, NA, NA, 31, 31, 32, 32, 34, 34, 34, 34, 34
)), class = "data.frame", row.names = c(NA, -17L))
  •  Tags:  
  • r
  • Related