Home > Net >  Regular expression to identify a python function body and locate all the executable lines(ie non-com
Regular expression to identify a python function body and locate all the executable lines(ie non-com

Time:10-16

I have a file which is a python code (may not be syntactically correct).

It has some functions which are commented out except the signature.

My goal is to detect those empty functions using a regex and clean them up.

Had it been only # kind of comment it would have been easier to locate if all lines had # in beginning between two lines starting with def but the issue is in many functions I have multi-line comments as well.

If you could suggest a way to change multi line comments to single line comments that would help too.

In case you are curious what is this useful for, this is a part of a python tool where we are trying to automate some of the steps of code refactoring.

Input:

def this_function_has_stuff(f, g, K):
    """ Thisfunction has stuff in it """
    if f:
       s = 0
    else:
       u =0
    return None

def fuly_commented_fucntion(f, g, K):
    """
    remove this empty function.
    Examples
    ========
    >>> which function is
    >>> empty
    """

def note_this_has_one_valid_line(f, K):
    """
    Make some bla.
    Examples
    ========
    >>> bla bla
    >>> bla bla
    x**2   1
    """
    return [K.abs(coff) for coff in f]

def empty_with_both_types_of_comment(f, K):
    """
    my bla bla
    Examples
    ========
    3
    """
    # if not f:
    # else:
    #    return max(dup_abs(f, K))

SOME_VAR = 6

Expected output:

def this_function_has_stuff(f, g, K):
    """ Thisfunction has stuff in it """
    if f:
       s = 0
    else:
       u =0
    return None

def note_this_has_one_valid_line(f, K):
    """
    Make some bla.
    Examples
    ========
    >>> bla bla
    >>> bla bla
    x**2   1
    """
    return [K.abs(coff) for coff in f]

SOME_VAR = 6

CodePudding user response:

I advise you not to try to accomplish this with regex.

Python grammar is not a Regular Language, and even in your case where you are just interested in a small subset of the syntax, there are so many possible variations and corners that, it is just not worth trying to do this with regex.

Instead, I suggest you to explore the awesome ast module, that can effectively parse a source and iterate over the code as a tree. You can then check all function definitions, and see if they have or not a valid code line.

You can, for example, implement a custom NodeTransformer that removes function definitions that are effectively empty. You'd need to properly define what is the meaning of "empty", but based on your question, I'd say it would be any function that only has docstrings or pass or ... (ellipsis).

import ast

class Cleaner(ast.NodeTransformer):
  def __init__(self):
    self.removed = []

  def visit_FunctionDef(self, node):
    for stmt in node.body:
      if isinstance(stmt, ast.Pass):
        continue
      if isinstance(stmt, ast.Expr) and isinstance(stmt.value, ast.Constant):
        const = stmt.value.value
        if isinstance(const, str) or const is Ellipsis:
          continue
      break
    else:
      self.removed.append(node.name)
      return None
    return node

  def visit_AsyncFunctionDef(self, node):
      return self.visit_FunctionDef(node)

with open("my/path/to/file.py", "r") as source:
  tree = ast.parse(source.read())

cleaner = Cleaner()
cleaner.visit(tree)
print(cleaner.removed)    # ['fuly_commented_fucntion', 'empty_with_both_types_of_comment']
print(ast.unparse(tree))  # will print your source code without those functions

There are a few limitations to this approach, and you should be aware:

  • ast does not work for syntactically incorrect source.
  • ast.parse ignores and removes comments, so if you unparse it, all the comments will be gone.
  • a function body might not be implemented and anyway it could be referenced somewhere in code, so refactoring functions only by checking if their body is empty is not safe.
  • this implementation does not check for nested functions. It could be done (just call self.generic_visit(node) inside the visitor methods), but it would raise a question: a function whose body only has empty nested functions, is itself empty?

One thing you can do, instead of unparsing the tree, is to use it only to identify the names of the unimplemented functions, then use a regular expression to find and remove their definitions (for example, see the answer from @megaultron below)

CodePudding user response:

Use the following regex:

(def (?!fuly_commented_fucntion|empty_with_both_types_of_comment).*(?:\n. ) )

?! deny the methods

(?:\n. ) ) do the line break

match.group(groupNum) in the code below contains the functions in string

the complete code

import re

#regex
regex = r"(def (?!fuly_commented_fucntion|empty_with_both_types_of_comment).*(?:\n. ) )"

test_str = ("\n"
    "def this_function_has_stuff(f, g, K):\n"
    "    \"\"\" Thisfunction has stuff in it \"\"\"\n"
    "    if f:\n"
    "       s = 0\n"
    "    else:\n"
    "       u =0\n"
    "    return None\n\n"
    "def fuly_commented_fucntion(f, g, K):\n"
    "    \"\"\"\n"
    "    remove this empty function.\n"
    "    Examples\n"
    "    ========\n"
    "    >>> which function is\n"
    "    >>> empty\n"
    "    \"\"\"\n\n"
    "def note_this_has_one_valid_line(f, K):\n"
    "    \"\"\"\n"
    "    Make some bla.\n"
    "    Examples\n"
    "    ========\n"
    "    >>> bla bla\n"
    "    >>> bla bla\n"
    "    x**2   1\n"
    "    \"\"\"\n"
    "    return [K.abs(coff) for coff in f]\n\n"
    "def empty_with_both_types_of_comment(f, K):\n"
    "    \"\"\"\n"
    "    my bla bla\n"
    "    Examples\n"
    "    ========\n"
    "    3\n"
    "    \"\"\"\n"
    "    # if not f:\n"
    "    # else:\n"
    "    #    return max(dup_abs(f, K))\n\n"
    "SOME_VAR = 6")

matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
    
    for groupNum in range(0, len(match.groups())):
        print('==============your methods=====================')
        groupNum = groupNum   1        
        print (match.group(groupNum))



 
  • Related