I am writing a regex to detect the exposed passwords/secrets/keys in a yaml file. Below is the example of the yaml file. The data
key is common to all my yaml files. subkeys under the data
key will vary for each file.
Scenario 1:
apiVersion: v1
data:
PASSWORD: mypass123
USER: {$USER_NAME}
metadata:
Timestamp: 2021-03-31T14:29:09Z
Scenario 2:
apiVersion: v1
data:
DATABASE: {$DATABASE_NAME}
USER: {$USER_NAME}
API_KEY: mykey456=
metadata:
Timestamp: 2021-03-31T14:29:09Z
As you can see above sensitive information is exposed for the keys PASSWORD
& API_KEY
. I need a regex to match the data
key & exposed values of sensitive information.
import re
import sys
from ruamel.yaml import YAML
yaml_str = """\
apiVersion: v1
data:
PASSWORD: mypass123
USER: {$USER_NAME}
metadata:
Timestamp: 2021-03-31T14:29:09Z
"""
regex = r'data:\s*-\s*\b[a-z0-9_ .\-,]:([a-z0-9=_\-]{1,4096})'
I have tried the above regex but it is not working. Any help is appreciated.
CodePudding user response:
What about just adding a list of keys that you think may be sensitive -- I don't think they would vary too widely? As an example:
-
However, if you just want to grab the contents within the
data:
section, then I think @Thefourthbird has a good approach. Or even better, parse the yaml file itself.CodePudding user response:
My guess is that YAML parsing is going to be much easier and more maintainable. For example:
import yaml yaml_str = """\ apiVersion: v1 data: PASSWORD: mypass123 USER: username123 metadata: Timestamp: 2021-03-31T14:29:09Z """ try: data = yaml.safe_load(yaml_str) except yaml.YAMLError as exc: # handle exception ... pass for field_of_interest in ["PASSWORD", "USER"]: print(data["data"][field_of_interest])