I have a vector of strings either starting with \n1.
or \ntext
and I wish to filter all those starting with \n1.
Sample:
[1] "\n1. Morgenhanen matter"
[2] "\n1. Morgenstund har guld"
[3] "\nMorgensange for børn be"
but I can't seem to grap those sentences starting with \n1. Here's where I'm at:
grepl("^['\\\\']n1", df$text)
but it returns false for all sentences...
In the end I want to end up with something like
library(tidyverse)
df %>%
filter(those sentences starting with \n1)
I'm sorry, I'm just not the best at regex in r...
CodePudding user response:
You could do:
library(dplyr)
df %>%
filter(df, grepl("^\\n1", text))
Output:
text
1 \n1. Morgenhanen matter
2 \n1. Morgenstund har guld
Data
df <- data.frame(text = c("\n1. Morgenhanen matter",
"\n1. Morgenstund har guld",
"\nMorgensange for børn be"))
CodePudding user response:
Note that the \n
in your strings is a newline, \x0A
, character. The ^['\\\\']n1
pattern matches a
^
- start of string['\\\\']
- a'
or\
charn1
- ann1
string.
So, as you see, your pattern does not match a newline char.
You can use
grep("^\\n1", df$text, value=TRUE)
See the R demo:
text <- c( "\n1. Morgenhanen matter", "\n1. Morgenstund har guld", "\nMorgensange for børn be")
grep("^\\n1", text, value=TRUE)
Output:
[1] "\n1. Morgenhanen matter" "\n1. Morgenstund har guld"
Here, "^\\n1"
is a ^\n1
regex pattern that matches
^
- start of string\n
- a newline1
- a1
char.