Ideally looking for a tidyverse solution to the following.
I have data which comes to me in random order. Every row has a unique four digit identifier (labeled ID below). I group it initially by a date column (simple enough). Another column indicates what the previous row should be within that date (labeled Prev_ID below). The first row for a given date has an Prev_ID which will not refer to something within that date. I want an output ordered like the following:
Date ID Prev_ID
1/1 8731 1436
1/1 4522 8731
1/1 1301 4522
1/2 6675 3690
1/2 9123 6675
1/2 0025 9123
I've tried creating some loops to do this but I'm getting a bit lost in the weeds, and because I'm dealing with a lot of data, I'm worried about it being slow. Seems like there should be a simple solution here.
Thoughts?
CodePudding user response:
Randomize the row-order:
set.seed(42)
dat <- dat[sample(nrow(dat)),]
dat
# Date ID Prev_ID
# 1 1/1 8731 1436
# 5 1/2 9123 6675
# 6 1/2 0025 9123
# 4 1/2 6675 3690
# 2 1/1 4522 8731
# 3 1/1 1301 4522
Sort based on Prev_ID
in ID
:
library(dplyr)
dat %>%
group_by(Date) %>%
mutate(otherrow = match(Prev_ID, ID)) %>%
arrange(Date, !is.na(otherrow), otherrow) %>%
ungroup() %>%
select(-otherrow)
# # A tibble: 6 x 3
# Date ID Prev_ID
# <chr> <chr> <chr>
# 1 1/1 8731 1436
# 2 1/1 4522 8731
# 3 1/1 1301 4522
# 4 1/2 6675 3690
# 5 1/2 0025 9123
# 6 1/2 9123 6675
Note: your Date
as a string is not going to be good for sort-order in the long term once you get month-days over 9, noting that sort(c("1/2", "1/11"))
puts 1/11
before 1/2
due to the lexicographic sort priorities. You can either convert to a "real" Date
-object (you'd need the year component), or you can convert the month-day to be zero-padded with dat$Date <- sub("/([0-9])$", "/0\\1", dat$Date)
.
Data
dat <- structure(list(Date = c("1/1", "1/2", "1/2", "1/2", "1/1", "1/1"), ID = c("8731", "9123", "0025", "6675", "4522", "1301"), Prev_ID = c("1436", "6675", "9123", "3690", "8731", "4522")), row.names = c(1L, 5L, 6L, 4L, 2L, 3L), class = "data.frame")