Home > Blockchain >  Order column based on data in another column in R (tidyverse)
Order column based on data in another column in R (tidyverse)

Time:11-02

Ideally looking for a tidyverse solution to the following.

I have data which comes to me in random order. Every row has a unique four digit identifier (labeled ID below). I group it initially by a date column (simple enough). Another column indicates what the previous row should be within that date (labeled Prev_ID below). The first row for a given date has an Prev_ID which will not refer to something within that date. I want an output ordered like the following:

Date    ID    Prev_ID
1/1    8731    1436  
1/1    4522    8731
1/1    1301    4522
1/2    6675    3690
1/2    9123    6675
1/2    0025    9123

I've tried creating some loops to do this but I'm getting a bit lost in the weeds, and because I'm dealing with a lot of data, I'm worried about it being slow. Seems like there should be a simple solution here.

Thoughts?

CodePudding user response:

Randomize the row-order:

set.seed(42)
dat <- dat[sample(nrow(dat)),]
dat
#   Date   ID Prev_ID
# 1  1/1 8731    1436
# 5  1/2 9123    6675
# 6  1/2 0025    9123
# 4  1/2 6675    3690
# 2  1/1 4522    8731
# 3  1/1 1301    4522

Sort based on Prev_ID in ID:

library(dplyr)
dat %>%
  group_by(Date) %>%
  mutate(otherrow = match(Prev_ID, ID)) %>%
  arrange(Date, !is.na(otherrow), otherrow) %>%
  ungroup() %>%
  select(-otherrow)
# # A tibble: 6 x 3
#   Date  ID    Prev_ID
#   <chr> <chr> <chr>  
# 1 1/1   8731  1436   
# 2 1/1   4522  8731   
# 3 1/1   1301  4522   
# 4 1/2   6675  3690   
# 5 1/2   0025  9123   
# 6 1/2   9123  6675   

Note: your Date as a string is not going to be good for sort-order in the long term once you get month-days over 9, noting that sort(c("1/2", "1/11")) puts 1/11 before 1/2 due to the lexicographic sort priorities. You can either convert to a "real" Date-object (you'd need the year component), or you can convert the month-day to be zero-padded with dat$Date <- sub("/([0-9])$", "/0\\1", dat$Date).


Data

dat <- structure(list(Date = c("1/1", "1/2", "1/2", "1/2", "1/1", "1/1"), ID = c("8731", "9123", "0025", "6675", "4522", "1301"), Prev_ID = c("1436", "6675", "9123", "3690", "8731", "4522")), row.names = c(1L, 5L, 6L, 4L, 2L, 3L), class = "data.frame")
  • Related