Home > Blockchain >  Rearranging a data frame by equal strings
Rearranging a data frame by equal strings

Time:11-21

I have a data frame with the following structure:

pos<- c(67,125,158,195,235,458,499,526,785,912,999,1525)
v_1<-c("j","c","v","r","s","q","r","r","s","t","u","v")
v_2<-c("c","t","v","r","s","q","r","w","c","c","o","v")
v_3<-c("z","c","v","r","s","q","r","w","c","b","p","v")
v_4<-c("x","w","z","z","s","q","r","w","c","o","t","v")
data<-as.data.frame(cbind(pos,v_1,v_2,v_3,v_4))

In this dataframe it is possible to find the same letters among the different columns in consecutive rows. I need to obtain a separate data frame with the values of the variable "pos" for these shared letters in continuos rows, as can bee seen in the figure: enter image description here

In this figure even all the columsn have the same letter in pos 1525 Is not importat since is not in a consecutive row. What I need to get is the dataframe in the right.

CodePudding user response:

Solution using tidyr and dplyr:

library(tidyr)
library(dplyr)

data %>%
  pivot_longer(!pos) %>%
  add_count(pos, value) %>%
  group_by(name) %>%
  mutate(consec = ifelse(
    n > 1 & (lag(n) > 1 | lead(n) > 1),
    pos,
    NA
  )) %>%
  summarize(
    start = min(consec, na.rm = TRUE),
    end = max(consec, na.rm = TRUE)
  )
# A tibble: 4 × 3
  name  start end  
  <chr> <chr> <chr>
1 v_1   125   499  
2 v_2   158   785  
3 v_3   125   785  
4 v_4   235   785  

Note, not sure if/how you want to handle if there is more than one set of consecutive rows, so this solution doesn’t address that.

  • Related