I am fairly new to R. I am trying to figure out a way to verify a sequence within a column. I tried using seq() but that's not really providing me much.
Here is a sample of the df
gp<-data.frame(Id=c(1503960366,1503960366,1503960366,4319703577,4319703577,4319703577,5553957443,5553957443,5553957443),
date=c("2016-04-27", "2016-04-12","2016-04-27", "2016-04-12", "2016-04-27","2016-04-27","2016-5-16","2016-4-16", "2016-5-16),
Cal=c(1347,1347,1348,1496,1497,1496,1688,1688,1688,2063,2063,2064))
The sequence is within column Cal. Each set of cal per Id has an increase of 1. What I want to do is verify or search for the sequence then create a new column that verifies the increase of 1 in True or false for that Id
##This is the printed-out version of the df.
Id date Cal
<dbl> <chr> <dbl>
1 1503960366 2016-04-27 1347
2 1503960366 2016-04-12 1347
3 1503960366 2016-04-27 1348
4 4319703577 2016-04-12 1496
5 4319703577 2016-04-27 1497
6 4319703577 2016-04-27 1496
7 5553957443 2016-5-16 1688
8 5553957443 2016-4-16 1688
9 5553957443 2016-5-16 1688
##This is the outcome I am looking for
Id date Cal Verify
<dbl> <chr> <dbl> <dbl>
1 1503960366 2016-04-27 1347 False
2 1503960366 2016-04-12 1347 False
3 1503960366 2016-04-27 1348 True
4 4319703577 2016-04-12 1496 False
5 4319703577 2016-04-27 1497 True
6 4319703577 2016-04-27 1496 False
7 5553957443 2016-5-16 1688 False
8 5553957443 2016-4-16 1688 False
9 5553957443 2016-5-16 1688 False
Any help or direction in the right place will be greatly appreciated. Thanks in advance.
CodePudding user response:
Subtract the current Cal
value with the previous one and check if the difference is equal to 1.
library(dplyr)
df %>%
mutate(Verify = Cal - lag(Cal, default = 0) == 1)
# Id date Cal Verify
#1 1503960366 2016-04-27 1347 FALSE
#2 1503960366 2016-04-12 1347 FALSE
#3 1503960366 2016-04-27 1348 TRUE
#4 4319703577 2016-04-12 1496 FALSE
#5 4319703577 2016-04-27 1497 TRUE
#6 4319703577 2016-04-27 1496 FALSE
#7 5553957443 2016-5-16 1688 FALSE
#8 5553957443 2016-4-16 1688 FALSE
#9 5553957443 2016-5-16 1688 FALSE
In base R -
df$Verify <- c(FALSE, df$Cal[-1] - df$Cal[-nrow(df)] == 1)
data
df <- structure(list(Id = c(1503960366, 1503960366, 1503960366, 4319703577,
4319703577, 4319703577, 5553957443, 5553957443, 5553957443),
date = c("2016-04-27", "2016-04-12", "2016-04-27", "2016-04-12",
"2016-04-27", "2016-04-27", "2016-5-16", "2016-4-16", "2016-5-16"
), Cal = c(1347L, 1347L, 1348L, 1496L, 1497L, 1496L, 1688L,
1688L, 1688L)), class = "data.frame", row.names = c(NA, -9L))
CodePudding user response:
Using diff
.
df <- transform(df, Verify=c(0, diff(Cal)) == 1)
df
# Id date Cal Verify
# 1 1503960366 2016-04-27 1347 FALSE
# 2 1503960366 2016-04-12 1347 FALSE
# 3 1503960366 2016-04-27 1348 TRUE
# 4 4319703577 2016-04-12 1496 FALSE
# 5 4319703577 2016-04-27 1497 TRUE
# 6 4319703577 2016-04-27 1496 FALSE
# 7 5553957443 2016-5-16 1688 FALSE
# 8 5553957443 2016-4-16 1688 FALSE
# 9 5553957443 2016-5-16 1688 FALSE