Home > Back-end >  How to find a Sequence with in a column Using R
How to find a Sequence with in a column Using R

Time:12-18

I am fairly new to R. I am trying to figure out a way to verify a sequence within a column. I tried using seq() but that's not really providing me much.

Here is a sample of the df

    gp<-data.frame(Id=c(1503960366,1503960366,1503960366,4319703577,4319703577,4319703577,5553957443,5553957443,5553957443),
  date=c("2016-04-27", "2016-04-12","2016-04-27", "2016-04-12", "2016-04-27","2016-04-27","2016-5-16","2016-4-16", "2016-5-16),
Cal=c(1347,1347,1348,1496,1497,1496,1688,1688,1688,2063,2063,2064))

The sequence is within column Cal. Each set of cal per Id has an increase of 1. What I want to do is verify or search for the sequence then create a new column that verifies the increase of 1 in True or false for that Id

##This is the printed-out version of the df.
        Id date         Cal
      <dbl> <chr>      <dbl>
1 1503960366 2016-04-27  1347
2 1503960366 2016-04-12  1347
3 1503960366 2016-04-27  1348
4 4319703577 2016-04-12  1496
5 4319703577 2016-04-27  1497
6 4319703577 2016-04-27  1496
7 5553957443 2016-5-16   1688
8 5553957443 2016-4-16   1688
9 5553957443 2016-5-16   1688
##This is the outcome I am looking for

         Id date         Cal  Verify
      <dbl> <chr>      <dbl>   <dbl>
1 1503960366 2016-04-27  1347   False
2 1503960366 2016-04-12  1347   False
3 1503960366 2016-04-27  1348   True
4 4319703577 2016-04-12  1496   False
5 4319703577 2016-04-27  1497   True
6 4319703577 2016-04-27  1496   False 
7 5553957443 2016-5-16   1688   False
8 5553957443 2016-4-16   1688   False
9 5553957443 2016-5-16   1688   False

Any help or direction in the right place will be greatly appreciated. Thanks in advance.

CodePudding user response:

Subtract the current Cal value with the previous one and check if the difference is equal to 1.

library(dplyr)

df %>%
  mutate(Verify = Cal - lag(Cal, default = 0) == 1)

#          Id       date  Cal Verify
#1 1503960366 2016-04-27 1347  FALSE
#2 1503960366 2016-04-12 1347  FALSE
#3 1503960366 2016-04-27 1348   TRUE
#4 4319703577 2016-04-12 1496  FALSE
#5 4319703577 2016-04-27 1497   TRUE
#6 4319703577 2016-04-27 1496  FALSE
#7 5553957443  2016-5-16 1688  FALSE
#8 5553957443  2016-4-16 1688  FALSE
#9 5553957443  2016-5-16 1688  FALSE

In base R -

df$Verify <- c(FALSE, df$Cal[-1] - df$Cal[-nrow(df)] == 1)

data

df <- structure(list(Id = c(1503960366, 1503960366, 1503960366, 4319703577, 
4319703577, 4319703577, 5553957443, 5553957443, 5553957443), 
    date = c("2016-04-27", "2016-04-12", "2016-04-27", "2016-04-12", 
    "2016-04-27", "2016-04-27", "2016-5-16", "2016-4-16", "2016-5-16"
    ), Cal = c(1347L, 1347L, 1348L, 1496L, 1497L, 1496L, 1688L, 
    1688L, 1688L)), class = "data.frame", row.names = c(NA, -9L))

CodePudding user response:

Using diff.

df <- transform(df, Verify=c(0, diff(Cal)) == 1)
df
#           Id       date  Cal Verify
# 1 1503960366 2016-04-27 1347  FALSE
# 2 1503960366 2016-04-12 1347  FALSE
# 3 1503960366 2016-04-27 1348   TRUE
# 4 4319703577 2016-04-12 1496  FALSE
# 5 4319703577 2016-04-27 1497   TRUE
# 6 4319703577 2016-04-27 1496  FALSE
# 7 5553957443  2016-5-16 1688  FALSE
# 8 5553957443  2016-4-16 1688  FALSE
# 9 5553957443  2016-5-16 1688  FALSE
  • Related