Home > Back-end >  Add a column to data frame based upon if the value exists in the range of a correpsonding dataframe
Add a column to data frame based upon if the value exists in the range of a correpsonding dataframe

Time:10-22

I have a file with positions, 1..16569 (1-based), and a file with feature information, i.e.; gene name etc.... I want to make one data frame based on if the position in dataframe_positions falls into the range specified by dataframe_features$start and dataframe_features$end.

I'm going to change the values to save space.

df_positions = as.data.frame(
 chromosome = rep('MT', 10),
 positions = 1:10,
 depth = c(rep(6,3), rep(7,3), rep(8,2), rep(10,2),
 stringsAsFactors = F
)

df_features = as.data.frame(
 chromosome = rep('MT', 10),
 start = c(1,4),
 end = c(3,10),
 feature = c('TRNF', 'RNR1'),
 stringsAsFactors = F
)

This is what I want the data to look like afterwards

chromosome positions depth feature
MT 1 6 TRNF
MT 2 6 TRNF
MT 3 6 TRNF
MT 4 7 RNR1
MT 5 7 RNR1
MT 6 7 RNR1
MT 7 8 RNR1
MT 8 8 RNR1
MT 9 10 RNR1
MT 10 10 RNR1

Here is what I have tried

x <- df_positions %>% mutate(feature = ifelse(between(df_positions$positions, df_features$start,df_features$end),df_features$feature, '')

This doesn't work. I think the dplyr function doesn't know to check each tuple. Is there a way to do this in R? I'm looking into plyr::mapvalues and then probably trying a for loop next.

Thanks.

CodePudding user response:

df_positions <- data.frame(
  chromosome = rep('MT', 10),
  positions = 1:10,
  depth = c(rep(6, 3), rep(7, 3), rep(8, 2), rep(10, 2)),
  stringsAsFactors = FALSE
)

df_features <- data.frame(
  chromosome = rep('MT', 10),
  start = c(1, 4),
  end = c(3, 10),
  feature = c('TRNF', 'RNR1'),
  stringsAsFactors = FALSE
)

df_positions$feature <- apply(df_positions, 1, function(x) {
  idx <- which(df_features$chromosome == x[ 'chromosome' ] &
                 df_features$start <= as.integer(x[ 'positions' ]) & 
                 df_features$end >= as.integer(x[ 'positions' ]))
  df_features[ idx, 'feature' ][ 1 ]
})

View(df_positions)

CodePudding user response:

I figured it out using a loop, but if anyone has a more R solution, I'd appreciate to see it!

data <- c()
data_to_map <- df_positions %>% select(locus) %>% pull()

for(row in 1:nrow(df_features)){

 for(i in data_to_map){
   check <- df_features[row,]
   if(i <= check$end & i >= check$start){
     data <- c(data, check$name)
   }else{ next }
  }
}

df_positions$feature <- data

done.

  • Related