Home > Mobile >  How to organise data based on 2 factor levels matching
How to organise data based on 2 factor levels matching

Time:10-05

I have a data frame which looks like this:

df <- structure(list(Image = c("low_1_red", "low_1_red", "low_1_red", "low_1_red", "low_1_red", "low_1_red", "low_1_red", "low_1_red","low_1_red", "low_1_red", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_green", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue", "low_1_blue"), Mean = c(232, 233, 233, 233, 97,232, 232, 233, 232, 95, 98, 100, 98, 98, 231, 99, 98, 99, 99, 231, 232, 233, 233, 233, 231, 232, 232, 233, 232, 231), X = c(146, 39, 1129, 1252, 872, 1002, 300, 275, 17, 1521, 146, 39, 1129, 1252, 872, 1002, 300, 275, 17, 1521, 146, 39, 1129, 1252, 872, 1002, 300, 275, 17, 1521), Y = c(1056, 156, 1553, 878, 310, 108, 383, 726, 199, 1069, 1056, 156, 1553, 878, 310, 108, 383, 726, 199, 1069, 1056, 156, 1553, 878, 310, 108, 383, 726, 199, 1069)), row.names = c(NA,-30L), class = "data.frame")

which is the output from image analysis of 10 points (see X and Y coords), overlayed on one image ("low 1"), which has been split into red, green and blue layers. What I want is to match up the red, green and blue output from each point by their X and Y coordinates so I end up with a dataframe that looks like this:

df2 <- structure(list(Image = c("low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue", "low_1_red", "low_1_green", "low_1_blue"), Mean = c(232, 98, 232, 233, 100, 233, 233, 98, 233, 233, 98, 233, 97, 231, 231, 232, 99, 232, 232, 98, 232, 233, 99, 233, 232, 99, 232, 95, 231, 231), X = c(146, 146, 146, 39, 39, 39, 1129, 1129, 1129, 1252, 1252, 1252, 872, 872, 872, 1002, 1002, 1002, 300, 300, 300, 275, 275, 275, 17, 17, 17, 1521, 1521, 1521), Y = c(1056, 1056, 1056, 156, 156, 156, 1553, 1553, 1553, 878, 878, 878, 310, 310, 310, 108, 108, 108, 383, 383, 383, 726, 726, 726, 199, 199, 199, 1069, 1069, 1069)), row.names = c(NA, -30L), class = "data.frame")

I've sorted them manually here but my actual data frame is around 90,000 rows so I'm looking for a quick fix! Any help much appreciated :)

CodePudding user response:

You can use dplyr::arrange:

dplyr::arrange(df, X,Y, Image)

Output:

         Image Mean    X    Y
1   low_1_blue  232   17  199
2  low_1_green   99   17  199
3    low_1_red  232   17  199
4   low_1_blue  233   39  156
5  low_1_green  100   39  156
6    low_1_red  233   39  156
7   low_1_blue  232  146 1056
8  low_1_green   98  146 1056
9    low_1_red  232  146 1056
10  low_1_blue  233  275  726
11 low_1_green   99  275  726
12   low_1_red  233  275  726
13  low_1_blue  232  300  383
14 low_1_green   98  300  383
15   low_1_red  232  300  383
16  low_1_blue  231  872  310
17 low_1_green  231  872  310
18   low_1_red   97  872  310
19  low_1_blue  232 1002  108
20 low_1_green   99 1002  108
21   low_1_red  232 1002  108
22  low_1_blue  233 1129 1553
23 low_1_green   98 1129 1553
24   low_1_red  233 1129 1553
25  low_1_blue  233 1252  878
26 low_1_green   98 1252  878
27   low_1_red  233 1252  878
28  low_1_blue  231 1521 1069
29 low_1_green  231 1521 1069
30   low_1_red   95 1521 1069

Faster option (you might consider if you have large data) is to set your df to data.table and use order

data.table::setDT(df)[order(X,Y,Image)]

Update: add group column:

library(data.table)
result = setDT(df)[order(X,Y,Image)][, grp:=rleid(X,Y)][]

Output:

          Image Mean    X    Y grp
 1:  low_1_blue  232   17  199   1
 2: low_1_green   99   17  199   1
 3:   low_1_red  232   17  199   1
 4:  low_1_blue  233   39  156   2
 5: low_1_green  100   39  156   2
 6:   low_1_red  233   39  156   2
 7:  low_1_blue  232  146 1056   3
 8: low_1_green   98  146 1056   3
 9:   low_1_red  232  146 1056   3
10:  low_1_blue  233  275  726   4
11: low_1_green   99  275  726   4
12:   low_1_red  233  275  726   4
13:  low_1_blue  232  300  383   5
14: low_1_green   98  300  383   5
15:   low_1_red  232  300  383   5
16:  low_1_blue  231  872  310   6
17: low_1_green  231  872  310   6
18:   low_1_red   97  872  310   6
19:  low_1_blue  232 1002  108   7
20: low_1_green   99 1002  108   7
21:   low_1_red  232 1002  108   7
22:  low_1_blue  233 1129 1553   8
23: low_1_green   98 1129 1553   8
24:   low_1_red  233 1129 1553   8
25:  low_1_blue  233 1252  878   9
26: low_1_green   98 1252  878   9
27:   low_1_red  233 1252  878   9
28:  low_1_blue  231 1521 1069  10
29: low_1_green  231 1521 1069  10
30:   low_1_red   95 1521 1069  10
  • Related