I'm trying to split columns in a dataframe and place them next to each other based on a single factor column. I would like to take this:
TOD Value
1 Day 135
2 Day 513
3 Day 567
4 Day 848
5 Day 578
6 Night 145
7 Night 267
8 Night 589
9 Night 258
10 Night 278
11 Night 149
and turn it into this:
TOD Value TOD_2 Value_2
1 Day 135 Night 145
2 Day 513 Night 267
3 Day 567 Night 589
4 Day 848 Night 258
5 Day 578 Night 278
The last row (11) needs to be removed so that all columns are of equal length. I can't have 0's or NA's as a 'filler' row. How do I do this? I have code that can accomplish the splitting part, but only if the columns are all equal:
df2<- df %>%
group_split(TOD) %>%
Map(function(x, y) {names(x) <- paste(names(x), y, sep = "_"); x}, ., 1:length(.)) %>%
bind_cols()
this also works:
df2<- do.call(cbind, split(df, df$TOD)) |>
setNames(c(names(df), paste0(names(df), "_2")))
I need to add something that will remove extra rows if the 'Value_2' column exceeds the number of values in the original 'Value' column.
CodePudding user response:
It may be better to reshape to wide format which could work for most of the cases i.e. length same or not
library(dplyr)
library(data.table)
library(tidyr)
df %>%
mutate(cn = match(TOD, unique(TOD)), rn = rowid(TOD)) %>%
pivot_wider(names_from = cn, values_from= c(TOD, Value)) %>%
select(-rn) %>%
drop_na()
-output
# A tibble: 5 × 4
TOD_1 TOD_2 Value_1 Value_2
<chr> <chr> <int> <int>
1 Day Night 135 145
2 Day Night 513 267
3 Day Night 567 589
4 Day Night 848 258
5 Day Night 578 278
Also, we may use qpcR:::cbind.na
na.omit(do.call(qpcR:::cbind.na, split(df, df$TOD)))
TOD Value Night.TOD Night.Value
1 Day 135 Night 145
2 Day 513 Night 267
3 Day 567 Night 589
4 Day 848 Night 258
5 Day 578 Night 278
data
df <- structure(list(TOD = c("Day", "Day", "Day", "Day", "Day", "Night",
"Night", "Night", "Night", "Night", "Night"), Value = c(135L,
513L, 567L, 848L, 578L, 145L, 267L, 589L, 258L, 278L, 149L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
CodePudding user response:
Here's a base solution:
result = split(df, df$TOD)
# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))
result = do.call(cbind, result)
result
# Day.TOD Day.Value Night.TOD Night.Value
# 1 Day 135 Night 145
# 2 Day 513 Night 267
# 3 Day 567 Night 589
# 4 Day 848 Night 258
# 5 Day 578 Night 278
CodePudding user response:
What about this? Similar to your original attempt:
library(tidyverse)
df |>
group_by(TOD) |>
mutate(ID = row_number()) |>
group_split() |>
reduce(full_join, by = "ID", suffix = c("_1", "_2")) |>
select(TOD_1, TOD_2, Value_1, Value_2) |>
drop_na()
#> # A tibble: 5 x 4
#> TOD_1 TOD_2 Value_1 Value_2
#> <chr> <chr> <int> <int>
#> 1 Day Night 135 145
#> 2 Day Night 513 267
#> 3 Day Night 567 589
#> 4 Day Night 848 258
#> 5 Day Night 578 278