I have a data frame with the following structure:
pos<- c(67,125,158,195,235,458,499,526,785,912,999,1525)
v_1<-c("j","c","v","r","s","q","r","r","s","t","u","v")
v_2<-c("c","t","v","r","s","q","r","w","c","c","o","v")
v_3<-c("z","c","v","r","s","q","r","w","c","b","p","v")
v_4<-c("x","w","z","z","s","q","r","w","c","o","t","v")
data<-as.data.frame(cbind(pos,v_1,v_2,v_3,v_4))
In this dataframe it is possible to find the same letters among the different columns in consecutive rows. I need to obtain a separate data frame with the values of the variable "pos" for these shared letters in continuos rows, as can bee seen in the figure:
In this figure even all the columsn have the same letter in pos 1525 Is not importat since is not in a consecutive row. What I need to get is the dataframe in the right.
CodePudding user response:
Solution using tidyr and dplyr:
library(tidyr)
library(dplyr)
data %>%
pivot_longer(!pos) %>%
add_count(pos, value) %>%
group_by(name) %>%
mutate(consec = ifelse(
n > 1 & (lag(n) > 1 | lead(n) > 1),
pos,
NA
)) %>%
summarize(
start = min(consec, na.rm = TRUE),
end = max(consec, na.rm = TRUE)
)
# A tibble: 4 × 3
name start end
<chr> <chr> <chr>
1 v_1 125 499
2 v_2 158 785
3 v_3 125 785
4 v_4 235 785
Note, not sure if/how you want to handle if there is more than one set of consecutive rows, so this solution doesn’t address that.