I've transposed a matrix and now I have a dataframe that looks like this:
Placebo High Medium Low
<dbl> <dbl> <dbl> <dbl>
1 0.0400 NA NA NA
2 0.04 NA NA NA
3 0.0200 NA NA NA
4 0.03 NA NA NA
5 -0.00500 NA NA NA
6 0.0300 NA NA NA
7 NA -0.04 NA NA
8 NA NA 0.0100 NA
9 NA NA NA 0.0100
10 NA NA NA 0.04
I would like to remove the NAs so every value goes up, and I get something like:
Placebo High Medium Low
<dbl> <dbl> <dbl> <dbl>
1 0.0400 -0.04 0.0100 0.0100
2 0.04 0.04
3 0.0200
4 0.03
5 -0.00500
6 0.0300
It doens't matter if the columns do not have the same length.
CodePudding user response:
You can't end up with a data frame because you have columns of different lengths. You will need to end up with a list, by doing something like:
df <- read.table(text = "Index Placebo High Medium Low
1 0.0400 NA NA NA
2 0.04 NA NA NA
3 0.0200 NA NA NA
4 0.03 NA NA NA
5 -0.00500 NA NA NA
6 0.0300 NA NA NA
7 NA -0.04 NA NA
8 NA NA 0.0100 NA
9 NA NA NA 0.0100
10 NA NA NA 0.04 ", header = TRUE)
df["Index"] <- NULL
lapply(df, function(x) x[!is.na(x)])
# $Placebo
# [1] 0.040 0.040 0.020 0.030 -0.005 0.030
# $High
# [1] -0.04
# $Medium
# [1] 0.01
# $Low
# [1] 0.01 0.04
CodePudding user response:
You can have a dataframe that looks like your desired output. However, it's impractical to have such kind of dataframe.
The "empty" cells are filled with "", therefore, the column type will be character instead of numeric.
do.call(data.frame,
lapply(
sapply(df, function(x) x[!is.na(x)]),
function(y) append(y, rep("", max(apply(df, 2, function(x) sum(!is.na(x)))) - length(y))))
)
Placebo High Medium Low
1 0.04 -0.04 0.01 0.01
2 0.04 0.04
3 0.02
4 0.03
5 -0.005
6 0.03
Replacing "" with NA
might be better, which preserves the column types as numeric.
do.call(data.frame,
lapply(
sapply(df, function(x) x[!is.na(x)]),
function(y) append(y, rep(NA, max(apply(df, 2, function(x) sum(!is.na(x)))) - length(y))))
)
Placebo High Medium Low
1 0.040 -0.04 0.01 0.01
2 0.040 NA NA 0.04
3 0.020 NA NA NA
4 0.030 NA NA NA
5 -0.005 NA NA NA
6 0.030 NA NA NA
Here's a potentially faster method that only uses one lapply
instead of lapply
sapply
.
do.call(data.frame,
setNames(lapply(1:ncol(df), function(x)
append(
df[, x][!is.na(df[, x])],
rep(NA, max(apply(df, 2, function(x)
sum(!is.na(x)))) - length(df[, x][!is.na(df[, x])]))
)),
colnames(df)))