I have a dataframe that looks like this...
C_array D_array E_array
20.000 NA NA
0.000 NA NA
0.000 NA NA
NA 17.000 NA
NA 17.000 NA
NA 21.000 NA
NA 49.000 NA
NA 52.000 NA
NA NA 31.000
NA NA 31.000
NA NA 32.000
NA NA 32.000
NA NA 34.000
NA NA 34.000
NA NA 34.000
NA NA 34.000
NA NA 34.000
How can I remove the leading NA values? I want it to look like this...
C_array D_array E_array
20.000 17.000 31.000
0.000 17.000 31.000
0.000 21.000 32.000
NA 49.000 32.000
NA 52.000 34.000
NA NA 34.000
NA NA 34.000
NA NA 34.000
NA NA 34.000
CodePudding user response:
We could rearrange the NA
s in each column so that the non-NA elements will be ordered before the NA and then subset the rows by removing rows having all NA
values
df2 <- df1
df2[] <- lapply(df2, function(x) x[order(is.na(x))])
df2[rowSums(is.na(df2)) < ncol(df2),]
-output
C_array D_array E_array
1 20 17 31
2 0 17 31
3 0 21 32
4 NA 49 32
5 NA 52 34
6 NA NA 34
7 NA NA 34
8 NA NA 34
9 NA NA 34
or do this in tidyverse
library(dplyr)
df1 %>%
mutate(across(everything(), ~ .[order(is.na(.))])) %>%
filter(!if_all(everything(), is.na))
C_array D_array E_array
1 20 17 31
2 0 17 31
3 0 21 32
4 NA 49 32
5 NA 52 34
6 NA NA 34
7 NA NA 34
8 NA NA 34
9 NA NA 34
data
df1 <- structure(list(C_array = c(20, 0, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), D_array = c(NA, NA, NA, 17,
17, 21, 49, 52, NA, NA, NA, NA, NA, NA, NA, NA, NA), E_array = c(NA,
NA, NA, NA, NA, NA, NA, NA, 31, 31, 32, 32, 34, 34, 34, 34, 34
)), class = "data.frame", row.names = c(NA, -17L))
CodePudding user response:
Here's an option, but I'm a bit worried about the format, if being on the same row doesn't mean anything you might want to keep your result as a list and not a table.
res_list <- lapply(df, \(x) x[!is.na(x)]) # you might stop here
as.data.frame(lapply(res_list, `length<-`, max(lengths(res))))
#> C_array D_array E_array
#> 1 20 17 31
#> 2 0 17 31
#> 3 0 21 32
#> 4 NA 49 32
#> 5 NA 52 34
#> 6 NA NA 34
#> 7 NA NA 34
#> 8 NA NA 34
#> 9 NA NA 34
CodePudding user response:
Are these real NA values or text strings stating like "NA" If it is the first scenario:
df <- df[!is.na(df$E_array),]
otherwise
df <- df[df$E_array != "NA" ,]
CodePudding user response:
You can use na.omit(YOURDATA) TO REMOVE na VALUES.
CodePudding user response:
df <- structure(list(C_array = c(20L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
D_array = c(NA, NA, NA, 17L, 17L, 21L, 49L, 52L, NA, NA, NA, NA, NA, NA, NA, NA, NA),
E_array = c(NA, NA, NA, NA, NA, NA, NA, NA, 31L, 31L, 32L, 32L, 34L, 34L, 34L, 34L, 34L)),
class = "data.frame", row.names = c(NA, -17L))
l <- sapply(df, function(x) x[!is.na(x)])
res <- sapply(l, function(x){length(x) <- max(lengths(l)); x})
res
#> C_array D_array E_array
#> [1,] 20 17 31
#> [2,] 0 17 31
#> [3,] 0 21 32
#> [4,] NA 49 32
#> [5,] NA 52 34
#> [6,] NA NA 34
#> [7,] NA NA 34
#> [8,] NA NA 34
#> [9,] NA NA 34
Created on 2021-11-05 by the reprex package (v2.0.1)
CodePudding user response:
stack
, na.omit
, and unstack
over columns and adapting length
s as in Moody's answer.
r <- stack(dat) |> na.omit() |> unstack() |>
{\(x) lapply(x, `length<-`, max(lengths(x)))}() |>
as.data.frame()
r
# C_array D_array E_array
# 1 20 17 31
# 2 0 17 31
# 3 0 21 32
# 4 NA 49 32
# 5 NA 52 34
# 6 NA NA 34
# 7 NA NA 34
# 8 NA NA 34
# 9 NA NA 34
This keeps the result free of the "na.action"
attribute.
str(r)
# 'data.frame': 9 obs. of 3 variables:
# $ C_array: num 20 0 0 NA NA NA NA NA NA
# $ D_array: num 17 17 21 49 52 NA NA NA NA
# $ E_array: num 31 31 32 32 34 34 34 34 34
Note:
R.version.string
# [1] "R version 4.1.2 (2021-11-01)"
Data:
dat <- structure(list(C_array = c(20, 0, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), D_array = c(NA, NA, NA, 17,
17, 21, 49, 52, NA, NA, NA, NA, NA, NA, NA, NA, NA), E_array = c(NA,
NA, NA, NA, NA, NA, NA, NA, 31, 31, 32, 32, 34, 34, 34, 34, 34
)), class = "data.frame", row.names = c(NA, -17L))