Home > Software design >  How to sort a dataframe in the same order as a sorted plot with seqIplot function?
How to sort a dataframe in the same order as a sorted plot with seqIplot function?

Time:05-12

I'm using seqIplot function (TraMineR) to create a sorted sequence index plot (sortv = "from.end"). How can I get the dataframe sorted in the same order ? Thank you !

CodePudding user response:

It's a little bit unclear which data frame you are alluding to, but I assume you are talking about your sequence data.

The sorting is done in a two-step procedure:

  1. Create a sort index using order
  2. sort data using this index

Below you find an example drawing on {TraMineR}'s example data actcal.

> library(TraMineR)
> 
> # actcal data set
> data(actcal)
> 
> # We use only a sample of 10 cases
> set.seed(1)
> actcal <- actcal[sample(nrow(actcal),10),]
> actcal.seq <- seqdef(actcal,13:24)
 [>] 4 distinct states appear in the data: 
     1 = A
     2 = B
     3 = C
     4 = D
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  A           A        A
     2  B           B        B
     3  C           C        C
     4  D           D        D
 [>] 10 sequences in the data set
 [>] min/max sequence length: 12/12
> 
> # here the sorting happens: 
> #     1) we create a sorting index with order
> #     2) we sort the data according to the index
> x <- as.data.frame(actcal.seq)
> sortvar <- do.call(order, x[,ncol(x):1])
> sorted.data <- actcal.seq[sortvar,]
> 
> # bonus: change rownumber to index position 
> # to ease comparison with seqplot output
> rownames(sorted.data) <- 1:nrow(sorted.data)
> 
> #Inspect data
> #unsorted data
> actcal.seq
     Sequence               
3649 A-A-A-A-A-A-A-A-A-A-A-A
6274 D-D-D-D-D-D-D-D-D-D-D-D
4130 C-C-C-C-C-C-C-C-C-C-C-C
3236 B-B-B-B-B-B-B-B-B-B-B-B
2302 B-B-B-B-B-B-B-B-B-B-B-B
2172 A-A-A-A-A-A-A-A-A-A-A-A
5671 D-D-D-D-D-C-C-C-C-D-D-D
4039 B-B-B-B-B-B-B-B-B-B-B-B
153  D-D-D-D-D-D-D-D-D-D-D-D
5712 B-B-B-B-B-B-B-B-B-B-B-B
> #sorted data
> sorted.data
   Sequence               
1  A-A-A-A-A-A-A-A-A-A-A-A
2  A-A-A-A-A-A-A-A-A-A-A-A
3  B-B-B-B-B-B-B-B-B-B-B-B
4  B-B-B-B-B-B-B-B-B-B-B-B
5  B-B-B-B-B-B-B-B-B-B-B-B
6  B-B-B-B-B-B-B-B-B-B-B-B
7  C-C-C-C-C-C-C-C-C-C-C-C
8  D-D-D-D-D-C-C-C-C-D-D-D
9  D-D-D-D-D-D-D-D-D-D-D-D
10 D-D-D-D-D-D-D-D-D-D-D-D
> 
> #Compare to plot output
> seqiplot(actcal.seq, sortv = "from.end")

enter image description here

CodePudding user response:

The TraMineRextras package has functions sorti and sortv that respectively return the sort index and the sort variable. To sorting the data, you need the sorti variable.

Here is an example using the actcal data.

library(TraMineRextras)

# actcal data set
data(actcal)
# using only sequence 11 to 20
actcal.seq <- seqdef(actcal[11:20,13:24])
sort.index <- sorti(actcal.seq, start="beg")
actcal.seq[sort.index,]

# 2103 A-A-A-A-A-A-A-A-A-A-A-A
# 528  A-A-A-A-A-A-A-A-A-A-A-A
# 4866 B-B-B-B-B-B-B-B-B-B-B-B
# 5108 B-B-B-B-B-B-B-B-B-B-B-B
# 5386 B-B-B-B-B-B-B-B-B-B-B-B
# 3876 B-B-B-B-B-B-B-B-B-B-B-B
# 5238 B-B-B-B-B-B-B-B-B-B-B-C
# 3972 C-C-C-C-C-C-C-C-C-B-B-B
# 4977 C-C-C-C-C-C-C-C-C-C-C-C
# 6175 D-D-D-D-D-D-D-D-D-D-D-D

With start="beg", you get the order corresponding to the sortv="from.start" argument of the plot function, and with start="end" the order corresponding to "from.end".

You can similarly use the sort.index with any table where rows match with sequences as actcal[11:20,] in the example above.

  • Related