I want to collapse this data frame so NA's are removed. How to accomplish this? Thanks!!
id <- c(1,1,1,2,2,3,4,5,5)
q1 <- c(23,55,7,88,90,34,11,22,99)
df <- data.frame(id,q1)
df$row <- 1:nrow(df)
spread(df, id, q1)
row 1 2 3 4 5
1 23 NA NA NA NA
2 55 NA NA NA NA
3 7 NA NA NA NA
4 NA 88 NA NA NA
5 NA 90 NA NA NA
6 NA NA 34 NA NA
7 NA NA NA 11 NA
8 NA NA NA NA 22
9 NA NA NA NA 89
I want it to look like this:
1 2 3 4 5
23 88 34 11 22
55 90 NA NA 89
7 NA NA NA NA
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
CodePudding user response:
The row
should be created on the sequence of 'id'. In addition, pivot_wider
would be a more general function compared to spread
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(row = row_number()) %>%
ungroup %>%
pivot_wider(names_from = id, values_from = q1) %>%
select(-row)
-output
# A tibble: 3 × 5
`1` `2` `3` `4` `5`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 23 88 34 11 22
2 55 90 NA NA 99
3 7 NA NA NA NA
Or use dcast
library(data.table)
dcast(setDT(df), rowid(id) ~ id, value.var = 'q1')[, id := NULL][]
1 2 3 4 5
<num> <num> <num> <num> <num>
1: 23 88 34 11 22
2: 55 90 NA NA 99
3: 7 NA NA NA NA
CodePudding user response:
Here's a base R solution. I sort each column so the non-NA values are at the top, find the number of non-NA values in the column with the most non-NA values (n), and return the top n rows from the data frame.
library(tidyr)
id <- c(1,1,1,2,2,3,4,5,5)
q1 <- c(23,55,7,88,90,34,11,22,99)
df <- data.frame(id,q1)
df$row <- 1:nrow(df)
df <- spread(df, id, q1)
collapse_df <- function(df) {
move_na_to_bottom <- function(x) x[order(is.na(x))]
sorted <- sapply(df, move_na_to_bottom)
count_non_na <- function(x) sum(!is.na(x))
n <- max(apply(df, 2, count_non_na))
sorted[1:n, ]
}
collapse_df(df[, -1])