Home > Software design >  R Merging two dataframes of different lengths, repeating rows of the shorter until a value change in
R Merging two dataframes of different lengths, repeating rows of the shorter until a value change in

Time:11-11

I'd like to merge two differently sized df together. Both have timestamps, but the timestamps do not overlap. I can get the timestamps from the small df into the larger df easily enough, but my goal is to repeat the rows of the smaller df until the next timestamp and parameter change as indicated by the smaller df. A minimum reproducible example to demonstrate [EDIT based on comments below]:

set.seed(123)
gratingspeed <- c(sample(c(-3:3), 10, replace=TRUE))
gratingfreq <- sample(c(-2, -1, 0.5, 0.5, 1, 2), 10, replace=TRUE)
timestamp <- runif(10, min=0, max=25)
timestamp[1] <- 0
stimuli <- data.frame(gratingspeed, gratingfreq, timestamp)
stimuli <- arrange(stimuli, stimuli$timestamp)
timemultiplier <- c(rep(c(-1,1), 5))
stimuli <- cbind(timemultiplier, stimuli)
stimuli$dfid <- row_number(stimuli$timestamp)

trackingx <- runif(25, min=0, max=800)
trackingy <- runif(25, min=0, max=800)
boutsuccess <- vector <- sample(c(rep(TRUE, 23), rep(FALSE, 2)), 25 ,replace = T)
timestamp <- c(1:25)
trackingdata <- cbind(trackingx, trackingy, boutsuccess, timestamp)
trackingdata <- as.data.frame(trackingdata)
trackingdata$dfid <- NA

I found that merge(sort=FALSE) is the best way to do this, but I need a common column to merge these df. I built a dfid column that acts as an index for this process like so:

trialdummy <- stimuli
trialdummy[c(1:3)] <- NA  
names(trialdummy)[1:3] <- c("trackingx", "trackingy", "boutsuccess")
trialdata <- rbind(trialdummy, trackingdata)
trialdata <- trialdata %>%
  arrange(timestamp)

But I need the dfid indices to repeat until the value changes. Perhaps involving rle(). Any help with either building the proper dfid column so I can use merge(sort=FALSE), or a better way to merge the two original df would be much appreciated. Tidyverse answers appreciated, but not necessary. Thank you!

CodePudding user response:

I think you need tidyr::fill

library(tidyr)
trialdata %>% 
  fill(dfid, .direction = "down")
#    trackingx   trackingy boutsuccess timestamp dfid
# 1         NA          NA          NA  0.000000    1
# 2  721.83924 102.0253202           1  1.000000    1
# 3  552.56422 602.6462914           1  2.000000    1
# 4  636.37393 716.0362873           1  3.000000    1
# 5         NA          NA          NA  3.677841    2
# 6   19.69095 299.5702207           1  4.000000    2
# 7  382.23678 532.0921557           1  5.000000    2
# 8  606.76763  75.8725287           1  6.000000    2
# 9  173.12635 307.1757102           1  7.000000    2
# 10        NA          NA          NA  7.228993    3
# 11 254.54481 219.5069157           1  8.000000    3
# 12 185.30063 651.7120311           1  9.000000    3
# ...

If you're using dplyr already, bind_rows is a more flexible and more efficient alternative to rbind.

  • Related