Home > database >  Split a data frame by a factor and remove rows of unequal columns
Split a data frame by a factor and remove rows of unequal columns

Time:09-14

I'm trying to split columns in a dataframe and place them next to each other based on a single factor column. I would like to take this:

   TOD  Value
1  Day   135
2  Day   513
3  Day   567
4  Day   848
5  Day   578
6  Night 145
7  Night 267
8  Night 589
9  Night 258
10 Night 278
11 Night 149

and turn it into this:

   TOD  Value TOD_2 Value_2
1  Day   135  Night  145
2  Day   513  Night  267
3  Day   567  Night  589
4  Day   848  Night  258
5  Day   578  Night  278
  

The last row (11) needs to be removed so that all columns are of equal length. I can't have 0's or NA's as a 'filler' row. How do I do this? I have code that can accomplish the splitting part, but only if the columns are all equal:

df2<- df %>%
  group_split(TOD) %>%
  Map(function(x, y) {names(x) <- paste(names(x), y, sep = "_"); x}, ., 1:length(.)) %>%
  bind_cols()

this also works:

df2<- do.call(cbind, split(df, df$TOD)) |> 
  setNames(c(names(df), paste0(names(df), "_2")))

I need to add something that will remove extra rows if the 'Value_2' column exceeds the number of values in the original 'Value' column.

CodePudding user response:

It may be better to reshape to wide format which could work for most of the cases i.e. length same or not

library(dplyr)
library(data.table)
library(tidyr)
df %>%
  mutate(cn = match(TOD, unique(TOD)), rn = rowid(TOD)) %>% 
 pivot_wider(names_from = cn, values_from= c(TOD, Value)) %>% 
 select(-rn) %>%
 drop_na()

-output

# A tibble: 5 × 4
  TOD_1 TOD_2 Value_1 Value_2
  <chr> <chr>   <int>   <int>
1 Day   Night     135     145
2 Day   Night     513     267
3 Day   Night     567     589
4 Day   Night     848     258
5 Day   Night     578     278

Also, we may use qpcR:::cbind.na

na.omit(do.call(qpcR:::cbind.na, split(df, df$TOD)))
  TOD Value Night.TOD Night.Value
1 Day   135     Night         145
2 Day   513     Night         267
3 Day   567     Night         589
4 Day   848     Night         258
5 Day   578     Night         278

data

df <- structure(list(TOD = c("Day", "Day", "Day", "Day", "Day", "Night", 
"Night", "Night", "Night", "Night", "Night"), Value = c(135L, 
513L, 567L, 848L, 578L, 145L, 267L, 589L, 258L, 278L, 149L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

CodePudding user response:

Here's a base solution:

result = split(df, df$TOD)

# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))

result = do.call(cbind, result)
result
#   Day.TOD Day.Value Night.TOD Night.Value
# 1     Day       135     Night         145
# 2     Day       513     Night         267
# 3     Day       567     Night         589
# 4     Day       848     Night         258
# 5     Day       578     Night         278

CodePudding user response:

What about this? Similar to your original attempt:

library(tidyverse)

df |>
  group_by(TOD) |>
  mutate(ID = row_number()) |>
  group_split() |>
  reduce(full_join, by = "ID", suffix = c("_1", "_2")) |>
  select(TOD_1, TOD_2, Value_1, Value_2) |>
  drop_na()
#> # A tibble: 5 x 4
#>   TOD_1 TOD_2 Value_1 Value_2
#>   <chr> <chr>   <int>   <int>
#> 1 Day   Night     135     145
#> 2 Day   Night     513     267
#> 3 Day   Night     567     589
#> 4 Day   Night     848     258
#> 5 Day   Night     578     278
  • Related