Home > Blockchain >  Regular expression - get attribute value
Regular expression - get attribute value

Time:07-11

Is anyone can help me to do this regex? I've been trying for hours without success.

in this code :

<!DOCTYPE html>
<html lang="fr">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title></title>
  <link rel="stylesheet" href="style.css">
</head>

<body>
  <script type="text/javascript" src="myScript.js"></script>
</body>
</html>

"But not this"

I try to get with regex all the attribute value in green :

  • fr,
  • utf-8,
  • viewport,
  • width=device-width, initial-scale=1.0,
  • stylesheet,
  • style.css,
  • text/javascript,
  • myScript.js

I have "(.*?)" that match all but also "But not This"... How can I do this ?

CodePudding user response:

Try it with

(?<=")[a-z] 

With Positive Lookbehind you will be searching for alphabats (a-z) after ".

See result: https://regex101.com/r/FaxkXM/1

Update

This will match all attributes from your HTML tags.

(?<==")[a-zA-Z0-9.=\-,\/ ] 

See the result: https://regex101.com/r/i5hoP3/1

CodePudding user response:

I used the R package called stringr to do this.

a <- '<code a="fr" b="en" c="sp" d="it">'
    
    library(stringr)
    b <- str_extract_all(a, '(?<=\")[a-z] (?=\")') %>% unlist()
> print(b)
[1] "fr" "en" "sp" "it"

CodePudding user response:

This is the correct Regex.

  • It works with all characters inside double quotes.
  • Check if it's inside < >

Thanks @Reza Saadati for your help

(?<==)("[a-zA-Z0-9.\s\S\-']*?.*?)"
  • Related