regex to subset rows with a certain range in R data frame-CodePudding

Hi I have a data frame (df) like this:

I would like to subset all the rows where the last two digits of the date are higher then 30 and lower then 60.

Output:

I can do this:

df[df$DATE>1988130 & rain_stats$DATE<1988170,]
df[df$DATE>1989130 & rain_stats$DATE<1989170,]

However, how to write a regex which scans the entire data frame in the DATE column to subset all the dates where the last two digit are higher or lower then certain values? Because I will have also a date such as 2012250

CodePudding user response：

Use the modulus here:

df[df$DATE %% 100 > 30 & df$Date %% 10 < 60, ]

CodePudding user response：

You could use this.

d[grepl("\\d{5}([4-5][0-9]|[3][1-9])", d$V1), ]
#         V1         x
# 5  1988131 0.4214841
# 6  1988140 0.6267195
# 7  1988150 0.9193493
# 16 1989140 0.4670599
# 17 1989150 0.7745778
# 18 1989159 0.5258432
# 23 2012250 0.6576061

Alternatively as raw string.

d[grepl(r"{\d{5}([4-5][0-9]|[3][1-9])}", d$V1), ]

See demo.

d <- structure(list(V1 = c(1988100L, 1988110L, 1988120L, 1988130L, 
1988131L, 1988140L, 1988150L, 1988160L, 1988170L, 1988180L, 1988190L, 
1989100L, 1989110L, 1989120L, 1989130L, 1989140L, 1989150L, 1989159L, 
1989160L, 1989170L, 1989180L, 1989190L, 2012250L, 2012260L, 2012270L, 
2012280L, 2012290L), x = c(0.161905547836795, 0.995759425219148, 
0.0691612137015909, 0.352196655003354, 0.421484081307426, 0.626719528576359, 
0.919349306030199, 0.273645391920581, 0.859082776354626, 0.479915214236826, 
0.703950216760859, 0.738272070884705, 0.51616885769181, 0.625624869018793, 
0.129869017750025, 0.467059890506789, 0.77457784046419, 0.525843213545159, 
0.568261774023995, 0.995282813441008, 0.195851961849257, 0.188985655317083, 
0.65760614303872, 0.965418175794184, 0.513075938913971, 0.288330526091158, 
0.833630389068276)), class = "data.frame", row.names = c(NA, 
-27L))