I tried to write a code in regex101.com to identify any kind of email address.
The general email address formats are like this:
This command works in www.regex101.com if i want to select just emails among the text. The regex101.com link is below: https://regex101.com/r/UA6CTA/1
(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)
but when i write this in R even when i use \ insitead of \ with grep command, it gives me "character(0)". the script is below:
emails <- c("[email protected]",
"[email protected]",
"[email protected]",
"invalid.edu",
"[email protected]",
"[email protected]")
emails[grep(pattern = r"(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)",
x=emails)]
The output in terminal is below:
emails[grep(pattern = r"((\w){1,25}(.|\w){1,25}@(\w){1,25}.
(\w){1,25}(.|\w|$)((\w){1,25}|$))",
x=emails)]
character(0)
Can anyone help me what to do ?
CodePudding user response:
I assume the regex used in regex101 was without double backslashes, like this:
(\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$)
Though this does not match the one in R example, with nor without extra escaping. In addition, regex in R example is marked as a raw string (r"..."
) but in R one should also use starting & closing sequence (i.e. r"(...)"
, more details in R help, ?Quotes
).
emails <- c("[email protected]",
"[email protected]",
"[email protected]",
"invalid.edu",
"[email protected]",
"[email protected]")
emails[grep(pattern=r"((\w){1,25}(.|\w){1,25}@(\w){1,25}.(\w){1,25}(.|\w|$)((\w){1,25}|$))", ,x=emails)]
#> [1] "[email protected]" "[email protected]"
#> [3] "[email protected]" "[email protected]"
#> [5] "[email protected]"
Or without raw string:
emails[grep(pattern="(\\w){1,25}(.|\\w){1,25}@(\\w){1,25}.(\\w){1,25}(.|\\w|$)((\\w){1,25}|$)", x=emails)]
#> [1] "[email protected]" "[email protected]"
#> [3] "[email protected]" "[email protected]"
#> [5] "[email protected]"
Created on 2023-01-28 with reprex v2.0.2
CodePudding user response:
That is incredible . But the key point is when you are using regex by grep as a sting, if after pattern="bla bla bla..." you go to the next line because of the R margin, it changes the string form. In below i describe the solution.
For instance i want to save the string "Hello to programming lovers" into a string variable.
st<- "Hello to programming lovers"
st
the output:
[1] "Hello to programming lovers"
Now for any reason i repeat the above code in 2 lines instead of one line.
st<- "Hello to
programming lovers"
st
the output:
[1] "Hello to \n programming lovers"
This is natural when i write this code in two lines it gives me "character(0)".
`emails[grep(pattern =r"((\w){1,25}(\.|\w){0,25}
(\w){1,25}@(\w){1,25}\.(\w){1,25}(\.|\w|$)((\w){1,25}|$))",x=emails)]
The output:
character(0)
Meanwhile when you use it in just one line or use with "paste" command with sep="" it gives you desired result.
This is simple but ticky!.