Home > Blockchain >  Sort rows for a data frame
Sort rows for a data frame

Time:05-03

I am searching for a simple dplyr or data.table solution. I need to sort rows of a large data frame, but only have a solution with for loops.

Here is a minimum example:

A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(20)
df = data.frame(col1 = sample(c(A,B),8,1), col2 = sample(c(A,B),8,1), col3 = sample(c(A,B),8,1))

  col1 col2 col3
1   B1   B1   A1
2   B2   B1   A5
3   A3   A5   B1
4   B3   B2   B3
5   A2   B2   A2
6   A1   A1   B2
7   A2   A3   A4
8   A5   A5   A1

The expected output should be:

  col1 col2 col3
1   B1   A1   B1
2   B1   A5   B2
3   B1   A3   A5
4   B2   B3   B3
5   B2   A2   A2
6   B2   A1   A1
7   A2   A3   A4
8   A1   A5   A5

So, the order of the rows for the sort algorithm is c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') with one exception. If there is already one of the B's in the first column we continue with the A's.

The next problem is, that I have three more columns in the data frame with different numbers which should be rearranged in the same order as these three columns.

CodePudding user response:

You can use apply, factor and sort twice with different orders.

order1 = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') #Main order
order2 = c('A1', 'A2', 'A3', 'A4', 'A5', 'B1', 'B2', 'B3') #Secondary order for rows with 1st column as "B"
startB <- grepl("B", df[, 1]) #Rows with 1st column being "B"

df <- data.frame(t(apply(df, 1, \(x) sort(factor(x, levels = order1)))))
df[startB, -1] <- t(apply(df[startB, ], 1, \(x) sort(factor(x[-1], levels = order2))))

output

  X1 X2 X3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5

CodePudding user response:

Might be more than a little bit too convoluted, but a dplyr and purrr option might be:

map2_dfr(.x = df %>%
             group_split(cond = as.numeric(grepl("^B", col1))),
         .y = list(vec1, vec2),
         ~ .x %>%
             mutate(pmap_dfr(across(c(starts_with("col"), - pluck(select(.x, "cond"), 1))), 
                             function(...) set_names(c(...)[order(match(c(...), .y))], names(c(...)))))) 

  col1  col2  col3   cond
  <chr> <chr> <chr> <dbl>
1 B1    A3    A5        0
2 B2    A2    A2        0
3 B2    A1    A1        0
4 A2    A3    A4        0
5 A1    A5    A5        0
6 B1    A1    B1        1
7 B2    A5    B1        1
8 B3    B2    B3        1

CodePudding user response:

My solution so far:

A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(100)

N = 20

df_1 = data.frame(col1 = sample(c(A,B),N,1), col2 = sample(c(A,B),N,1), col3 = sample(c(A,B),N,1))
vec = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
df_2 = t(apply(df_1,1,function(x)match(x,vec)))

df_3 = t(apply(df_2,1,sort))

tr = rowSums(matrix(df_3 %in% c(1,2,3),nrow(df_3), ncol(df_3))) == 2

change = which((df_3[,2]*tr)!=0)

save = df_3[change,2]
df_3[change,2] = df_3[change,3]
df_3[change,3] = save

df_4 = matrix(vec[df_3],nrow(df_3),ncol(df_3))

from df_2 to df_3 the place of the number is changing and I can rearrange the other columns by that. Looks a little bit complicated

  • Related