Home > Blockchain >  How to move elements of a column up to top of dataframe in R
How to move elements of a column up to top of dataframe in R

Time:08-19

So, I have a dataframe that looks something like this:

A    B    C

a1   NA   NA

a2   NA   NA

NA   b1   NA

NA   NA   c1

NA   NA   c2

I want to make it look like this:

A    B    C

a1   b1   c1

a2   NA   c2

In other words, right now the dataframe is perfectly diagonal. I want to end up with 3 orderly columns and it doesn't matter to me if there are NAs after the last element in each column. How can I do this in R? Tidyverse solution would be best but not exclusively looking for one.

CodePudding user response:

We could loop across the columns, order based on the NA elements and then filter only rows having at least one non-NA

library(dplyr)
df1 %>%
  mutate(across(everything(), ~ .x[order(is.na(.x))])) %>%
  filter(if_any(everything(), complete.cases))

-output

   A    B  C
1 a1   b1 c1
2 a2 <NA> c2

Or using base R

df1[] <- lapply(df1, \(x)  x[order(is.na(x))])
df1[rowSums(!is.na(df1)) > 0,]
   A    B  C
1 a1   b1 c1
2 a2 <NA> c2

data

df1 <- structure(list(A = c("a1", "a2", NA, NA, NA), B = c(NA, NA, "b1", 
NA, NA), C = c(NA, NA, NA, "c1", "c2")), class = "data.frame", 
row.names = c(NA, 
-5L))

CodePudding user response:

Update: Code a little shorter:

df %>% 
  pivot_longer(everything()) %>% 
  arrange(name, value) %>% 
  mutate(x = max(parse_number(value), na.rm = TRUE)) %>% 
  group_by(name) %>% 
  slice(1:x[1]) %>% 
  pivot_wider(names_from = name, values_from = value, values_fn = list) %>% 
  unnest(cols = c(A, B, C))

Here is an alternative (longer :-)) way:

library(tidyverse)

df %>% 
  pivot_longer(
    everything()
  ) %>% 
  drop_na() %>% 
  group_by(name) %>% 
  summarise(cur_data()[seq(max(count(df, df$name)$n)),]) %>% 
  pivot_wider(values_fn = list) %>% 
  unnest(cols = c(A, B, C)) %>% 
  filter(!if_all(everything(), ~ is.na(.)))

 A     B     C    
  <chr> <chr> <chr>
1 a1    b1    c1   
2 a2    NA    c2  

CodePudding user response:

Another possible solution, based on purrr::map_dfc:

library(purrr)

map_dfc(df, ~ {y <- .x[!is.na(.x)]; length(y) <- max(colSums(!is.na(df))); y})

#> # A tibble: 2 × 3
#>   A     B     C    
#>   <chr> <chr> <chr>
#> 1 a1    b1    c1   
#> 2 a2    <NA>  c2
  • Related