Home > OS >  Collapse data frame so NAs are removed
Collapse data frame so NAs are removed

Time:04-11

I want to collapse this data frame so NA's are removed. How to accomplish this? Thanks!!

id <- c(1,1,1,2,2,3,4,5,5)
q1 <- c(23,55,7,88,90,34,11,22,99)
df <- data.frame(id,q1)
df$row <- 1:nrow(df)
spread(df, id, q1)

  row  1  2  3  4  5
   1 23 NA NA NA NA
   2 55 NA NA NA NA
   3  7 NA NA NA NA
   4 NA 88 NA NA NA
   5 NA 90 NA NA NA
   6 NA NA 34 NA NA
   7 NA NA NA 11 NA
   8 NA NA NA NA 22
   9 NA NA NA NA 89

I want it to look like this:

    1  2  3  4  5
    23 88 34 11 22
    55 90 NA NA 89
     7 NA NA NA NA
 

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

CodePudding user response:

The row should be created on the sequence of 'id'. In addition, pivot_wider would be a more general function compared to spread

library(dplyr)
library(tidyr)
df %>% 
  group_by(id) %>%
  mutate(row = row_number()) %>% 
  ungroup %>%
  pivot_wider(names_from = id, values_from = q1) %>%
  select(-row)

-output

# A tibble: 3 × 5
    `1`   `2`   `3`   `4`   `5`
  <dbl> <dbl> <dbl> <dbl> <dbl>
1    23    88    34    11    22
2    55    90    NA    NA    99
3     7    NA    NA    NA    NA

Or use dcast

library(data.table)
dcast(setDT(df), rowid(id) ~ id, value.var = 'q1')[, id := NULL][]
       1     2     3     4     5
   <num> <num> <num> <num> <num>
1:    23    88    34    11    22
2:    55    90    NA    NA    99
3:     7    NA    NA    NA    NA

CodePudding user response:

Here's a base R solution. I sort each column so the non-NA values are at the top, find the number of non-NA values in the column with the most non-NA values (n), and return the top n rows from the data frame.

library(tidyr)

id <- c(1,1,1,2,2,3,4,5,5)
q1 <- c(23,55,7,88,90,34,11,22,99)
df <- data.frame(id,q1)
df$row <- 1:nrow(df)
df <- spread(df, id, q1)

collapse_df <- function(df) {

    move_na_to_bottom <- function(x) x[order(is.na(x))]
    sorted <- sapply(df, move_na_to_bottom)

    count_non_na <- function(x) sum(!is.na(x))
    n <- max(apply(df, 2, count_non_na))

    sorted[1:n, ]

}

collapse_df(df[, -1])
  •  Tags:  
  • r
  • Related