Home > Back-end >  regex to subset rows with a certain range in R data frame
regex to subset rows with a certain range in R data frame

Time:02-18

Hi I have a data frame (df) like this:

DATE
1988100
1988110
1988120
1988130
1988140
1988150
1988160
1988170
1988180
1988190
1989100
1989110
1989120
1989130
1989140
1989150
1989160
1989170
1989180
1989190
.......

I would like to subset all the rows where the last two digits of the date are higher then 30 and lower then 60.

Output:

DATE
1988140
1988150
1988160
1989140
1989150
1989160
.......

I can do this:

df[df$DATE>1988130 & rain_stats$DATE<1988170,]
df[df$DATE>1989130 & rain_stats$DATE<1989170,]

However, how to write a regex which scans the entire data frame in the DATE column to subset all the dates where the last two digit are higher or lower then certain values? Because I will have also a date such as 2012250

CodePudding user response:

Use the modulus here:

df[df$DATE %% 100 > 30 & df$Date %% 10 < 60, ]

CodePudding user response:

You could use this.

d[grepl("\\d{5}([4-5][0-9]|[3][1-9])", d$V1), ]
#         V1         x
# 5  1988131 0.4214841
# 6  1988140 0.6267195
# 7  1988150 0.9193493
# 16 1989140 0.4670599
# 17 1989150 0.7745778
# 18 1989159 0.5258432
# 23 2012250 0.6576061

Alternatively as raw string.

d[grepl(r"{\d{5}([4-5][0-9]|[3][1-9])}", d$V1), ]

See demo.


d <- structure(list(V1 = c(1988100L, 1988110L, 1988120L, 1988130L, 
1988131L, 1988140L, 1988150L, 1988160L, 1988170L, 1988180L, 1988190L, 
1989100L, 1989110L, 1989120L, 1989130L, 1989140L, 1989150L, 1989159L, 
1989160L, 1989170L, 1989180L, 1989190L, 2012250L, 2012260L, 2012270L, 
2012280L, 2012290L), x = c(0.161905547836795, 0.995759425219148, 
0.0691612137015909, 0.352196655003354, 0.421484081307426, 0.626719528576359, 
0.919349306030199, 0.273645391920581, 0.859082776354626, 0.479915214236826, 
0.703950216760859, 0.738272070884705, 0.51616885769181, 0.625624869018793, 
0.129869017750025, 0.467059890506789, 0.77457784046419, 0.525843213545159, 
0.568261774023995, 0.995282813441008, 0.195851961849257, 0.188985655317083, 
0.65760614303872, 0.965418175794184, 0.513075938913971, 0.288330526091158, 
0.833630389068276)), class = "data.frame", row.names = c(NA, 
-27L))
  • Related