Split a data frame by a factor and remove rows of unequal columns-CodePudding

I'm trying to split columns in a dataframe and place them next to each other based on a single factor column. I would like to take this:

   TOD  Value
1  Day   135
2  Day   513
3  Day   567
4  Day   848
5  Day   578
6  Night 145
7  Night 267
8  Night 589
9  Night 258
10 Night 278
11 Night 149

and turn it into this:

   TOD  Value TOD_2 Value_2
1  Day   135  Night  145
2  Day   513  Night  267
3  Day   567  Night  589
4  Day   848  Night  258
5  Day   578  Night  278

The last row (11) needs to be removed so that all columns are of equal length. I can't have 0's or NA's as a 'filler' row. How do I do this? I have code that can accomplish the splitting part, but only if the columns are all equal:

df2<- df %>%
  group_split(TOD) %>%
  Map(function(x, y) {names(x) <- paste(names(x), y, sep = "_"); x}, ., 1:length(.)) %>%
  bind_cols()

this also works:

df2<- do.call(cbind, split(df, df$TOD)) |> 
  setNames(c(names(df), paste0(names(df), "_2")))

I need to add something that will remove extra rows if the 'Value_2' column exceeds the number of values in the original 'Value' column.

CodePudding user response：

It may be better to reshape to wide format which could work for most of the cases i.e. length same or not

library(dplyr)
library(data.table)
library(tidyr)
df %>%
  mutate(cn = match(TOD, unique(TOD)), rn = rowid(TOD)) %>% 
 pivot_wider(names_from = cn, values_from= c(TOD, Value)) %>% 
 select(-rn) %>%
 drop_na()

-output

# A tibble: 5 × 4
  TOD_1 TOD_2 Value_1 Value_2
  <chr> <chr>   <int>   <int>
1 Day   Night     135     145
2 Day   Night     513     267
3 Day   Night     567     589
4 Day   Night     848     258
5 Day   Night     578     278

Also, we may use qpcR:::cbind.na

na.omit(do.call(qpcR:::cbind.na, split(df, df$TOD)))
  TOD Value Night.TOD Night.Value
1 Day   135     Night         145
2 Day   513     Night         267
3 Day   567     Night         589
4 Day   848     Night         258
5 Day   578     Night         278

data

df <- structure(list(TOD = c("Day", "Day", "Day", "Day", "Day", "Night", 
"Night", "Night", "Night", "Night", "Night"), Value = c(135L, 
513L, 567L, 848L, 578L, 145L, 267L, 589L, 258L, 278L, 149L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

CodePudding user response：

Here's a base solution:

result = split(df, df$TOD)

# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))

result = do.call(cbind, result)
result
#   Day.TOD Day.Value Night.TOD Night.Value
# 1     Day       135     Night         145
# 2     Day       513     Night         267
# 3     Day       567     Night         589
# 4     Day       848     Night         258
# 5     Day       578     Night         278

CodePudding user response：

What about this? Similar to your original attempt:

library(tidyverse)

df |>
  group_by(TOD) |>
  mutate(ID = row_number()) |>
  group_split() |>
  reduce(full_join, by = "ID", suffix = c("_1", "_2")) |>
  select(TOD_1, TOD_2, Value_1, Value_2) |>
  drop_na()
#> # A tibble: 5 x 4
#>   TOD_1 TOD_2 Value_1 Value_2
#>   <chr> <chr>   <int>   <int>
#> 1 Day   Night     135     145
#> 2 Day   Night     513     267
#> 3 Day   Night     567     589
#> 4 Day   Night     848     258
#> 5 Day   Night     578     278