I have a dataframe that has missing values at each column, but at different rows. For simplicity, let's see the following dataframe (real dataframe is much more complex):
first_column <- c(1, 2, NA,NA)
second_column <- c(NA, NA, 4,9)
df <- data.frame(first_column, second_column)
and we get:
first_column second_column
1 1 NA
2 2 NA
3 NA 4
4 NA 9
Now, I want to reshape the dataframe, after removing these missing values. I want the following:
first_column second_column
1 1 4
2 2 9
Is there an automatic way to do it (real dataframe has dimensions 1800 x 33)?
CodePudding user response:
We may have to reorder the column values so that the NA
s are at the end and then use drop_na
library(dplyr)
library(tidyr)
df %>%
summarise(across(everything(), ~ .x[order(is.na(.x))])) %>%
drop_na()
-output
first_column second_column
1 1 4
2 2 9
If there are unequal distribution of NA
s in each column and wants to remove the row only if all the columns have NA at a particular row after ordering, use if_all/if_all
in filter
df %>%
mutate(across(everything(), ~ .x[order(is.na(.x))])) %>%
filter(if_any(everything(), complete.cases))
-output
first_column second_column
1 1 4
2 2 9
CodePudding user response:
One possible solution:
df_new = as.data.frame(lapply(df, function(x) x[!is.na(x)]))
first_column second_column
1 1 4
2 2 9