Consider the following data structure (df):
ID | Text |
---|---|
1 | Example |
2 | Example - 1 |
3 | Example - 2 |
4 | Example - 3 |
5 | Example - 4 |
6 | Example - 5 |
7 | Example - NA |
8 | Text |
9 | Text - 10 |
10 | Text - 20 |
11 | Text - 30 |
12 | Text - 40 |
13 | Text - 50 |
14 | Text - 60 |
15 | Text - 70 |
16 | Text - 80 |
17 | Text - 90 |
18 | Text - 100 |
In the column "Text", I want to find all rows that contain the following pattern: WhitespaceHyphenWhitespaceSingledigit
Or in other words, I want to extract the following rows:
ID | Text |
---|---|
2 | Example - 1 |
3 | Example - 2 |
4 | Example - 3 |
5 | Example - 4 |
6 | Example - 5 |
Currently I use the grepl()-function in combination with regular expressions. However none of my attempts like
- df[which(grepl("s{1}-\s{1}\d{1}$", df$Text)),]
- df[which(grepl("\b\s{1}-\s{1}\d{1}\b$", df$Text)),]
has worked out. Since I am a beginner in programming, I would be grateful for any advices. Thanks in advance.
CodePudding user response:
I would use the following regex pattern:
\s-\s\d(?!\d)
This matches a hyphen in between whitespaces, followed by a single digit which itself is followed by either a non digit character or end of the input.
Full R code:
df[grepl("\\s-\\s\\d(?!\\d)", df$Text, perl=TRUE), ]