I am searching for a simple dplyr or data.table solution. I need to sort rows of a large data frame, but only have a solution with for loops.
Here is a minimum example:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(20)
df = data.frame(col1 = sample(c(A,B),8,1), col2 = sample(c(A,B),8,1), col3 = sample(c(A,B),8,1))
col1 col2 col3
1 B1 B1 A1
2 B2 B1 A5
3 A3 A5 B1
4 B3 B2 B3
5 A2 B2 A2
6 A1 A1 B2
7 A2 A3 A4
8 A5 A5 A1
The expected output should be:
col1 col2 col3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
So, the order of the rows for the sort algorithm is c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
with one exception. If there is already one of the B's in the first column we continue with the A's.
The next problem is, that I have three more columns in the data frame with different numbers which should be rearranged in the same order as these three columns.
CodePudding user response:
You can use apply
, factor
and sort
twice with different orders.
order1 = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') #Main order
order2 = c('A1', 'A2', 'A3', 'A4', 'A5', 'B1', 'B2', 'B3') #Secondary order for rows with 1st column as "B"
startB <- grepl("B", df[, 1]) #Rows with 1st column being "B"
df <- data.frame(t(apply(df, 1, \(x) sort(factor(x, levels = order1)))))
df[startB, -1] <- t(apply(df[startB, ], 1, \(x) sort(factor(x[-1], levels = order2))))
output
X1 X2 X3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
CodePudding user response:
Might be more than a little bit too convoluted, but a dplyr
and purrr
option might be:
map2_dfr(.x = df %>%
group_split(cond = as.numeric(grepl("^B", col1))),
.y = list(vec1, vec2),
~ .x %>%
mutate(pmap_dfr(across(c(starts_with("col"), - pluck(select(.x, "cond"), 1))),
function(...) set_names(c(...)[order(match(c(...), .y))], names(c(...))))))
col1 col2 col3 cond
<chr> <chr> <chr> <dbl>
1 B1 A3 A5 0
2 B2 A2 A2 0
3 B2 A1 A1 0
4 A2 A3 A4 0
5 A1 A5 A5 0
6 B1 A1 B1 1
7 B2 A5 B1 1
8 B3 B2 B3 1
CodePudding user response:
My solution so far:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(100)
N = 20
df_1 = data.frame(col1 = sample(c(A,B),N,1), col2 = sample(c(A,B),N,1), col3 = sample(c(A,B),N,1))
vec = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
df_2 = t(apply(df_1,1,function(x)match(x,vec)))
df_3 = t(apply(df_2,1,sort))
tr = rowSums(matrix(df_3 %in% c(1,2,3),nrow(df_3), ncol(df_3))) == 2
change = which((df_3[,2]*tr)!=0)
save = df_3[change,2]
df_3[change,2] = df_3[change,3]
df_3[change,3] = save
df_4 = matrix(vec[df_3],nrow(df_3),ncol(df_3))
from df_2
to df_3
the place of the number is changing and I can rearrange the other columns by that.
Looks a little bit complicated