Home > Software design >  I tried this in regex101.com for email addresses but when i apply it in R with grep it does not work
I tried this in regex101.com for email addresses but when i apply it in R with grep it does not work

Time:01-30

I tried to write a code in regex101.com to identify any kind of email address.

The general email address formats are like this:

[email protected]

[email protected]

[email protected]

This command works in www.regex101.com if i want to select just emails among the text. The regex101.com link is below: https://regex101.com/r/UA6CTA/1

(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)

but when i write this in R even when i use \ insitead of \ with grep command, it gives me "character(0)". the script is below:

emails <- c("[email protected]",
"[email protected]",
"[email protected]",
"invalid.edu",
"[email protected]",
"[email protected]")
emails[grep(pattern = r"(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)",
x=emails)]

The output in terminal is below:

emails[grep(pattern = r"((\w){1,25}(.|\w){1,25}@(\w){1,25}.
              (\w){1,25}(.|\w|$)((\w){1,25}|$))",
              x=emails)]
character(0)

Can anyone help me what to do ?

CodePudding user response:

I assume the regex used in regex101 was without double backslashes, like this:

(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)

Though this does not match the one in R example, with nor without extra escaping. In addition, regex in R example is marked as a raw string (r"...") but in R one should also use starting & closing sequence (i.e. r"(...)", more details in R help, ?Quotes).

emails <- c("[email protected]",
             "[email protected]",
             "[email protected]",
             "invalid.edu",
             "[email protected]",
             "[email protected]")

emails[grep(pattern=r"((\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$))", ,x=emails)]
#> [1] "[email protected]"     "[email protected]"       
#> [3] "[email protected]"     "[email protected]"       
#> [5] "[email protected]"

Or without raw string:

emails[grep(pattern="(\\w){1,25}(.|\\w){1,25}@(\\w){1,25}.(\\w){1,25}(.|\\w|$)((\\w){1,25}|$)", x=emails)]
#> [1] "[email protected]"     "[email protected]"       
#> [3] "[email protected]"     "[email protected]"       
#> [5] "[email protected]"

Created on 2023-01-28 with reprex v2.0.2

CodePudding user response:

That is incredible . But the key point is when you are using regex by grep as a sting, if after pattern="bla bla bla..." you go to the next line because of the R margin, it changes the string form. In below i describe the solution.

For instance i want to save the string "Hello to programming lovers" into a string variable.

st<- "Hello to programming lovers"
st

the output:

[1] "Hello to programming lovers"

Now for any reason i repeat the above code in 2 lines instead of one line.

st<- "Hello to 
programming lovers"
st

the output:

[1] "Hello to \n    programming lovers"

This is natural when i write this code in two lines it gives me "character(0)".

`emails[grep(pattern =r"((\w){1,25}(\.|\w){0,25}
        (\w){1,25}@(\w){1,25}\.(\w){1,25}(\.|\w|$)((\w){1,25}|$))",x=emails)]

The output:

character(0)

Meanwhile when you use it in just one line or use with "paste" command with sep="" it gives you desired result.

This is simple but ticky!.

  • Related