If my dataframe looks something like this
employee <- c('John','Peter','Jolie', 'Susan', 'Mark', 'David')
a <- c(21000, NA, 26800, NA, NA ,NA)
b <- c(NA, 35000, NA, NA, NA, NA)
c <- c(NA,NA,NA,40000, 20000, 45000)
d <- as.Date(c('2010-11-1','2008-3-25',NA, NA,'2007-3-14', NA))
e <- as.Date(c(NA, NA, NA,'2014-6-1', NA,'2017-10-15'))
f <- as.Date(c( NA, NA,'2010-3-20',NA, NA, NA))
employ.data <- data.frame(employee, a,b,c,d,e,f)
is there a way to coerce the data to remove the NAs and consolidate this to three columns? So it would look like:
John 21000 2010-11-1
Peter 35000 2008-3-25
Jolie 26800 2010-3-20
Susan 40000 2014-6-1
Mark 20000 2007-3-14
David 45000 2017-10-15
CodePudding user response:
An option is coalesce
from dplyr
- coalesce
columns a to c, and d to f separately
library(dplyr)
employ.data %>%
transmute(employee, a = coalesce(a, b, c),
date = coalesce(d, e, f))
-output
employee a date
1 John 21000 2010-11-01
2 Peter 35000 2008-03-25
3 Jolie 26800 2010-03-20
4 Susan 40000 2014-06-01
5 Mark 20000 2007-03-14
6 David 45000 2017-10-15
Or using base R
type.convert(as.data.frame(t(apply(employ.data, 1,
FUN = function(x) na.omit(x)))), as.is = TRUE)
CodePudding user response:
this is a base R solution that can also accommodate the above changes in the dataframe. Maybe it is not the best solution in terms of performance but it should work
data_return <- data.frame(matrix(NA, nrow(employ.data), 3))
i_lauf <- 1
tmp <- ""
for(i in 1:nrow(employ.data)) # i=1
{
if( (ncol(employ.data) - sum(is.na(employ.data[i,]), na.rm = FALSE)) == 3) # chec for 3 entries
{
tmp <- ""
for(j in 1:ncol(employ.data))
{
if( is.na(employ.data[i,j]) == FALSE)
{
if(j == 1)
{
tmp <- paste(employ.data[i,j])
}else{
tmp <- paste(tmp, employ.data[i,j], sep = "_")
}
i_lauf <- i_lauf 1
}
}
data_return[i, ] <- data.frame(matrix(strsplit(tmp, "_")[[1]], 1, 3))
}
}
Greetings