Home > Software engineering >  remove missing values from dataframe and reshape the dataframe with the non-missing ones
remove missing values from dataframe and reshape the dataframe with the non-missing ones

Time:10-20

I have a dataframe that has missing values at each column, but at different rows. For simplicity, let's see the following dataframe (real dataframe is much more complex):

first_column <- c(1, 2, NA,NA)
second_column <- c(NA, NA, 4,9)

df <- data.frame(first_column, second_column)

and we get:

  first_column second_column
1            1            NA
2            2            NA
3           NA             4
4           NA             9

Now, I want to reshape the dataframe, after removing these missing values. I want the following:

  first_column second_column
1            1            4
2            2            9

Is there an automatic way to do it (real dataframe has dimensions 1800 x 33)?

CodePudding user response:

We may have to reorder the column values so that the NAs are at the end and then use drop_na

library(dplyr)
library(tidyr)
df %>%
   summarise(across(everything(), ~ .x[order(is.na(.x))])) %>% 
   drop_na()

-output

   first_column second_column
1            1             4
2            2             9

If there are unequal distribution of NAs in each column and wants to remove the row only if all the columns have NA at a particular row after ordering, use if_all/if_all in filter

df %>%
   mutate(across(everything(), ~ .x[order(is.na(.x))])) %>%
   filter(if_any(everything(), complete.cases))

-output

  first_column second_column
1            1             4
2            2             9

CodePudding user response:

One possible solution:

df_new = as.data.frame(lapply(df, function(x) x[!is.na(x)]))

  first_column second_column
1            1             4
2            2             9
  • Related