I have a long (one row) data file with many values. It needs to be broken up into multiple rows. While the specifics of why I need to do this isn't important, the logic is that column i should always be bigger than column i 1. I.e. The values along a row should be decreasing.
The best way I can think to do this is to break up the data frame into multiple rows with an 'if then' style of function: If column i > i-1, start a new row. If i < i-1, keep this value in the row.
#Example data but with similar format to my real data
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1)
I would like it to end up looking like this.
3 2 1
2 1
1
3 2 1
I'm not very proficient with functions referring to i position in a data frame and the kind of data manipulation this needs. Any advice would be appreciated.
CodePudding user response:
Here is a tidy solution. Please let me know if this solves your question:
library(tidyverse)
df <- data.frame(matrix(ncol = 9, nrow = 1))
df[1,] <- c(3, 2, 1, 2, 1, 1, 3, 2, 1)
df %>%
pivot_longer(cols = everything(), names_to = "vars") %>%
mutate(smaller_than_prev = value < lag(value) | is.na(lag(value)),
num_falses = cumsum(smaller_than_prev == FALSE)) %>%
group_by(num_falses) %>%
mutate(row_num = row_number()) %>%
pivot_wider(names_from = row_num, values_from = value, values_fill = NA, names_prefix = "var") %>%
fill(c(`var1`, `var2`, `var3`), .direction = "downup") %>%
slice_head(n = 1) %>%
ungroup() %>%
select(`var1`, `var2`, `var3`)
CodePudding user response:
Splitting the vector into groups is simple, but how the data are finally stored depends on what you are trying to do with the result. Here is a simple way to split the data:
vect <- unname(unlist(df)) # Convert the data to a simple vector
cut <- which(diff(vect) >= 0) # Find the points for splitting the vector
grps <- rep(1:4, diff(c(0, cut, length(vect)))) # Define the groups created
groups <- split(vect, grps) # Create a list containing the groups
groups
# $`1`
# [1] 3 2 1
#
# $`2`
# [1] 2 1
#
# $`3`
# [1] 1
#
# $`4`
# [1] 3 2 1
A data frame and a matrix requires that all of the columns are the same length so those are not structures that you can use to save the result. To convert to a matrix we need to pad with missing values:
maxno <- max(sapply(groups, length)) # How long is the longest run?
t(sapply(groups, function(x) c(x, rep(NA, maxno - length(x)))))
# [,1] [,2] [,3]
# 1 3 2 1
# 2 2 1 NA
# 3 1 NA NA
# 4 3 2 1