Home > Blockchain >  R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit
R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit

Time:08-02

Consider the following data structure (df):

ID Text
1 Example
2 Example - 1
3 Example - 2
4 Example - 3
5 Example - 4
6 Example - 5
7 Example - NA
8 Text
9 Text - 10
10 Text - 20
11 Text - 30
12 Text - 40
13 Text - 50
14 Text - 60
15 Text - 70
16 Text - 80
17 Text - 90
18 Text - 100

In the column "Text", I want to find all rows that contain the following pattern: WhitespaceHyphenWhitespaceSingledigit

Or in other words, I want to extract the following rows:

ID Text
2 Example - 1
3 Example - 2
4 Example - 3
5 Example - 4
6 Example - 5

Currently I use the grepl()-function in combination with regular expressions. However none of my attempts like

  • df[which(grepl("s{1}-\s{1}\d{1}$", df$Text)),]
  • df[which(grepl("\b\s{1}-\s{1}\d{1}\b$", df$Text)),]

has worked out. Since I am a beginner in programming, I would be grateful for any advices. Thanks in advance.

CodePudding user response:

I would use the following regex pattern:

\s-\s\d(?!\d)

This matches a hyphen in between whitespaces, followed by a single digit which itself is followed by either a non digit character or end of the input.

Full R code:

df[grepl("\\s-\\s\\d(?!\\d)", df$Text, perl=TRUE), ]
  • Related