Is there a command to consolidate columns in R?-CodePudding

If my dataframe looks something like this

employee <- c('John','Peter','Jolie', 'Susan', 'Mark', 'David')
a <- c(21000, NA, 26800, NA, NA ,NA)
b <- c(NA, 35000, NA, NA, NA, NA)
c <- c(NA,NA,NA,40000, 20000, 45000)
d <- as.Date(c('2010-11-1','2008-3-25',NA, NA,'2007-3-14', NA))
e <- as.Date(c(NA, NA, NA,'2014-6-1', NA,'2017-10-15'))
f <- as.Date(c( NA, NA,'2010-3-20',NA, NA, NA))
employ.data <- data.frame(employee, a,b,c,d,e,f)

is there a way to coerce the data to remove the NAs and consolidate this to three columns? So it would look like:

John   21000  2010-11-1
Peter  35000  2008-3-25
Jolie  26800  2010-3-20
Susan  40000  2014-6-1
Mark   20000  2007-3-14
David  45000  2017-10-15

CodePudding user response：

An option is coalesce from dplyr - coalesce columns a to c, and d to f separately

library(dplyr)
employ.data %>%
     transmute(employee, a = coalesce(a, b, c), 
                        date = coalesce(d, e, f))

-output

employee     a       date
1     John 21000 2010-11-01
2    Peter 35000 2008-03-25
3    Jolie 26800 2010-03-20
4    Susan 40000 2014-06-01
5     Mark 20000 2007-03-14
6    David 45000 2017-10-15

Or using base R

type.convert(as.data.frame(t(apply(employ.data, 1, 
     FUN = function(x) na.omit(x)))), as.is = TRUE)

CodePudding user response：

this is a base R solution that can also accommodate the above changes in the dataframe. Maybe it is not the best solution in terms of performance but it should work


data_return <- data.frame(matrix(NA, nrow(employ.data), 3))

i_lauf <- 1
tmp <- ""

for(i in 1:nrow(employ.data)) # i=1
{
  if( (ncol(employ.data) - sum(is.na(employ.data[i,]), na.rm = FALSE)) == 3) # chec for 3 entries
  {
    tmp <- ""
    
    for(j in 1:ncol(employ.data))
    {
      if( is.na(employ.data[i,j]) == FALSE)
      {
       if(j == 1)
       {
         tmp <- paste(employ.data[i,j])
       }else{
         tmp <- paste(tmp, employ.data[i,j], sep = "_") 
       }
        i_lauf <- i_lauf   1
      }
    }
    
    data_return[i, ] <- data.frame(matrix(strsplit(tmp, "_")[[1]], 1, 3))
  }
}

Greetings