Home > Net >  Regex for detecting keys in the yaml file
Regex for detecting keys in the yaml file

Time:02-10

I am writing a regex to detect the exposed passwords/secrets/keys in a yaml file. Below is the example of the yaml file. The data key is common to all my yaml files. subkeys under the data key will vary for each file.

Scenario 1:

apiVersion: v1
data:
    PASSWORD: mypass123
    USER: {$USER_NAME}
metadata:
    Timestamp: 2021-03-31T14:29:09Z

Scenario 2:

apiVersion: v1
data:
    DATABASE: {$DATABASE_NAME}
    USER: {$USER_NAME}
    API_KEY: mykey456=
metadata:
    Timestamp: 2021-03-31T14:29:09Z

As you can see above sensitive information is exposed for the keys PASSWORD & API_KEY. I need a regex to match the data key & exposed values of sensitive information.

import re
import sys
from ruamel.yaml import YAML

yaml_str = """\
apiVersion: v1
data:
    PASSWORD: mypass123
    USER: {$USER_NAME}
metadata:
    Timestamp: 2021-03-31T14:29:09Z
"""

regex = r'data:\s*-\s*\b[a-z0-9_ .\-,]:([a-z0-9=_\-]{1,4096})'

I have tried the above regex but it is not working. Any help is appreciated.

CodePudding user response:

What about just adding a list of keys that you think may be sensitive -- I don't think they would vary too widely? As an example:

  • enter image description here

    However, if you just want to grab the contents within the data: section, then I think @Thefourthbird has a good approach. Or even better, parse the yaml file itself.

    CodePudding user response:

    My guess is that YAML parsing is going to be much easier and more maintainable. For example:

    import yaml
    
    yaml_str = """\
    apiVersion: v1
    data:
        PASSWORD: mypass123
        USER: username123
    metadata:
        Timestamp: 2021-03-31T14:29:09Z
    """
    
    try:
        data = yaml.safe_load(yaml_str)
    except yaml.YAMLError as exc:
        # handle exception ...
        pass
    
    for field_of_interest in ["PASSWORD", "USER"]:
        print(data["data"][field_of_interest])
    
  • Related