Is anyone can help me to do this regex? I've been trying for hours without success.
in this code :
<!DOCTYPE html>
<html lang="fr">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title></title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<script type="text/javascript" src="myScript.js"></script>
</body>
</html>
"But not this"
I try to get with regex all the attribute value in green :
- fr,
- utf-8,
- viewport,
- width=device-width, initial-scale=1.0,
- stylesheet,
- style.css,
- text/javascript,
- myScript.js
I have "(.*?)"
that match all but also "But not This"...
How can I do this ?
CodePudding user response:
Try it with
(?<=")[a-z]
With Positive Lookbehind you will be searching for alphabats (a-z) after "
.
See result: https://regex101.com/r/FaxkXM/1
Update
This will match all attributes from your HTML tags.
(?<==")[a-zA-Z0-9.=\-,\/ ]
See the result: https://regex101.com/r/i5hoP3/1
CodePudding user response:
I used the R package
called stringr
to do this.
a <- '<code a="fr" b="en" c="sp" d="it">'
library(stringr)
b <- str_extract_all(a, '(?<=\")[a-z] (?=\")') %>% unlist()
> print(b)
[1] "fr" "en" "sp" "it"
CodePudding user response:
This is the correct Regex.
- It works with all characters inside double quotes.
- Check if it's inside < >
Thanks @Reza Saadati for your help
(?<==)("[a-zA-Z0-9.\s\S\-']*?.*?)"