Grapping \n1 with grepl in r-CodePudding

I have a vector of strings either starting with \n1. or \ntext and I wish to filter all those starting with \n1. Sample:

[1] "\n1. Morgenhanen matter"
[2] "\n1. Morgenstund har guld"
[3] "\nMorgensange for børn be"

but I can't seem to grap those sentences starting with \n1. Here's where I'm at:

grepl("^['\\\\']n1", df$text)

but it returns false for all sentences...

In the end I want to end up with something like

library(tidyverse)

df %>% 
   filter(those sentences starting with \n1)

I'm sorry, I'm just not the best at regex in r...

CodePudding user response：

You could do:

library(dplyr)

df %>%
  filter(df, grepl("^\\n1", text))

Output:

                       text
1   \n1. Morgenhanen matter
2 \n1. Morgenstund har guld

Data

df <- data.frame(text = c("\n1. Morgenhanen matter", 
                      "\n1. Morgenstund har guld", 
                      "\nMorgensange for børn be"))

CodePudding user response：

Note that the \n in your strings is a newline, \x0A, character. The ^['\\\\']n1 pattern matches a

^ - start of string
['\\\\'] - a ' or \ char
n1 - an n1 string.

So, as you see, your pattern does not match a newline char.

You can use

grep("^\\n1", df$text, value=TRUE)

See the R demo:

text <- c( "\n1. Morgenhanen matter", "\n1. Morgenstund har guld", "\nMorgensange for børn be")
grep("^\\n1", text, value=TRUE)

Output:

[1] "\n1. Morgenhanen matter"   "\n1. Morgenstund har guld"

Here, "^\\n1" is a ^\n1 regex pattern that matches

^ - start of string
\n - a newline
1 - a 1 char.