Hi I have a data frame (df) like this:
DATE
1988100
1988110
1988120
1988130
1988140
1988150
1988160
1988170
1988180
1988190
1989100
1989110
1989120
1989130
1989140
1989150
1989160
1989170
1989180
1989190
.......
I would like to subset all the rows where the last two digits of the date are higher then 30 and lower then 60.
Output:
DATE
1988140
1988150
1988160
1989140
1989150
1989160
.......
I can do this:
df[df$DATE>1988130 & rain_stats$DATE<1988170,]
df[df$DATE>1989130 & rain_stats$DATE<1989170,]
However, how to write a regex which scans the entire data frame in the DATE column to subset all the dates where the last two digit are higher or lower then certain values? Because I will have also a date such as 2012250
CodePudding user response:
Use the modulus here:
df[df$DATE %% 100 > 30 & df$Date %% 10 < 60, ]
CodePudding user response:
You could use this.
d[grepl("\\d{5}([4-5][0-9]|[3][1-9])", d$V1), ]
# V1 x
# 5 1988131 0.4214841
# 6 1988140 0.6267195
# 7 1988150 0.9193493
# 16 1989140 0.4670599
# 17 1989150 0.7745778
# 18 1989159 0.5258432
# 23 2012250 0.6576061
Alternatively as raw string.
d[grepl(r"{\d{5}([4-5][0-9]|[3][1-9])}", d$V1), ]
See demo.
d <- structure(list(V1 = c(1988100L, 1988110L, 1988120L, 1988130L,
1988131L, 1988140L, 1988150L, 1988160L, 1988170L, 1988180L, 1988190L,
1989100L, 1989110L, 1989120L, 1989130L, 1989140L, 1989150L, 1989159L,
1989160L, 1989170L, 1989180L, 1989190L, 2012250L, 2012260L, 2012270L,
2012280L, 2012290L), x = c(0.161905547836795, 0.995759425219148,
0.0691612137015909, 0.352196655003354, 0.421484081307426, 0.626719528576359,
0.919349306030199, 0.273645391920581, 0.859082776354626, 0.479915214236826,
0.703950216760859, 0.738272070884705, 0.51616885769181, 0.625624869018793,
0.129869017750025, 0.467059890506789, 0.77457784046419, 0.525843213545159,
0.568261774023995, 0.995282813441008, 0.195851961849257, 0.188985655317083,
0.65760614303872, 0.965418175794184, 0.513075938913971, 0.288330526091158,
0.833630389068276)), class = "data.frame", row.names = c(NA,
-27L))