Home > Software design >  Regex to match Python docstrings
Regex to match Python docstrings

Time:05-15

I would like to parse Python docstrings as follows:

Summary of class that is multiple lines.

Parameters
----------
param1 : str
    Param 1 is a param

Returns
-------
value : str

Examples
--------
>>> print()

Maps to

{
    'base': 'Summary of class that is multiple lines.',
    'params': 'param1 : str\n\tParam 1 is a param',
    'returns': 'value : str',
    'examples': '>>> print()'
}

This is pretty straightforward to do with named groups and re.match.groupdict, but the issue I am running into is that each of these four groups are optional. There are several questions on here about optional groups, specifically this one seems relevant, but this has nice ending characters to break things up. This docstring can have any characters (currently using [\s\S]).

CodePudding user response:

I think this should work:

^(?P<base>[\s\S] ?)??(?:(?:^|\n\n)Parameters\n-{10}\n(?P<params>[\s\S]*?))?(?:(?:^|\n\n)Returns\n-{7}\n(?P<returns>[\s\S]*?))?(?:(?:^|\n\n)Examples\n-{8}\n(?P<examples>[\s\S]*))?$

The code I used to generate this regex:

import re

sep_regex = r"(?:^|\n\n)"
summary_regex  = r"(?P<base>[\s\S] ?)"
param_regex    = rf"(?:{sep_regex}Parameters\n-{{10}}\n(?P<params>[\s\S]*?))"
returns_regex  = rf"(?:{sep_regex}Returns\n-{{7}}\n(?P<returns>[\s\S]*?))"
examples_regex = rf"(?:{sep_regex}Examples\n-{{8}}\n(?P<examples>[\s\S]*))"

combined_regex = rf"^{summary_regex}??{param_regex}?{returns_regex}?{examples_regex}?$"

print(combined_regex)

Example:

from pprint import pprint
match = re.search(combined_regex, text)  # text being your example text
pprint(match.groupdict())
# out: {'base': 'Summary of class that is multiple lines.',
# out:  'examples': '>>> print()',
# out:  'params': 'param1 : str\n    Param 1 is a param',
# out:  'returns': 'value : str'}

I also tested it with various sections of the docstring dropped.

CodePudding user response:

Instead of writing your own regular expressions, you can use existing libraries to parse docstrings, whose authors have already done the hard work for you.

I put together an example of this using the docstring-parser package. To install this package you need to run this command:

pip install docstring-parser

Then you can use the following code to parse your docstring:

from docstring_parser import parse

docstring_text = """Summary of class that is multiple lines.

Parameters
----------
param1 : str
    Param 1 is a param

Returns
-------
value : str

Examples
--------
>>> print()
"""

docstring = parse(docstring_text)
docstring_info = {
    "base": docstring.short_description,
    "params": [
        {
            "name": param.arg_name,
            "type": param.type_name,
            "description": param.description,
        }
        for param in docstring.params
    ],
    "returns": {
        "name": docstring.returns.return_name,
        "type": docstring.returns.type_name,
    }
    if docstring.returns
    else {},
    "examples": [{"snippet": example.snippet} for example in docstring.examples],
}
print(docstring_info)

This gives the following output (with indentation added for clarity):

{
    "base": "Summary of class that is multiple lines.",
    "params": [{"name": "param1", "type": "str", "description": "Param 1 is a param"}],
    "returns": {"name": "value", "type": "str"},
    "examples": [{"snippet": ">>> print()"}],
}
  • Related