Home > Back-end >  How can I simplify my function and make it more pythonic?
How can I simplify my function and make it more pythonic?

Time:09-27

I have written a Python function for some post processing in my text recognition algorithm. It works fine, but it seems to be kind of lengthy and has many if-else-conditions. I am looking for a way to simplify my code:

def postProcessing(new_s): #new_s is a list
  import string
    
  new_s=removeFrontLetters(new_s) #This function has to be called first
 
  if all(ch in string.digits for ch in new_s[:4]): #checking if first 4 elements are digits.
    
    if all(ch in string.ascii_letters for ch in new_s[4:5]): #checking if the [4] and [5] are letters
      
      if len(new_s)>=7:         
        if new_s[6]=='C': 
          
          new_s=new_s[:7] #if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
          new_s=list(reversed(new_s)) 
          return(new_s)

          
        else:
          new_s=new_s[:6] #if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
          new_s=list(reversed(new_s))

          return(new_s)
      

      else:
        new_s=list(reversed(new_s)) #if length<7, the reversed of the given list has to be returned.
        return(new_s)
    else:
      print('not valid') #if the [4] and [5] are not letters, it is not valid
  else:
    print('not valid') #if the [:4] are not digits, it is not valid

This seems very beginner-level and lengthy. I am a beginner, but I am trying to improve my function. Do you have suggestions?

CodePudding user response:

You can invert your if statements and use early returns to reduce the indentation of your code.

def postProcessing(new_s):  # new_s is a list
    import string

    new_s = removeFrontLetters(new_s)  # This function has to be called first

    if not all(ch in string.digits for ch in new_s[:4]):  # checking if first 4 elements are digits.
        raise ValueError("First four elements must be digits")
    
    if not all(ch in string.ascii_letters for ch in new_s[4:5]):  # checking if the [4] and [5] are letters
        raise ValueError("First elements 4 and 5 must be digits")
    
    if len(new_s) <= 7:
        new_s = list(reversed(new_s))  # if length<7, the reversed of the given list has to be returned.
        return (new_s)
    
    if new_s[6] == 'C':

        new_s = new_s[
                :7]  # if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
        new_s = list(reversed(new_s))
        return (new_s)

    new_s = new_s[:6]  # if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
    new_s = list(reversed(new_s))
    return (new_s)

CodePudding user response:

It's quite neat you ask 'the world' for advise. Did you know there's a dedicated stackexchange site for this? https://codereview.stackexchange.com/.

Unless you insist writing Python code for this, it seems that you need a regular expression here.

So some tips:

  • use regex for pattern matching
  • use variables to express what an expression means
  • use exceptions instead of an 'invalid value' string
  • separate the 'parsing' from the processing to keep functions small and focused
  • use doctest to document and test small functions
def post_process(new_s):
    """
    reverse a string (with a twist)

    what follows is a doctest.  You can run it with
    $ python -m doctest my_python_file.py

    >>> post_process('1234abCdef')
    'Cba4321'
    >>> post_process('1234abdef')
    'ba4321'
    """

    cmds = {
      'C': cmd_c,
      '': cmd_none
    }

    command, content = command_and_content(new_s)
    process = cmds[command]
    return process(content)


def cmd_c(content):
  return 'C'   "".join(reversed(content))
def cmd_none(content):
  return "".join(reversed(content))

The command_and_content function replaces the parsing logic:

def command_and_content(new_s):
    # get help on https://regex101.com/ to find the
    # regular expression for four digits and two letters
    digits_then_ascii = re.compile(r"(\d{4}[a-z]{2})(C?)(.*)")
    if match := digits_then_ascii.match(new_s):
      content = match.group(1)
      command = match.group(2)
      return command, content

    # pylint will ask you to not use an else clause after a return
    # Also, Python advises exceptions for notifying erroneous input
    raise ValueError(new_s)

CodePudding user response:

From the context you provided, I assume that all this processing can happen in-place (i.e. without the need to allocate additional memory). The benefit of lists is that they are mutable, so you can actually do all your operations in-place.

This adheres to the style conventions (PEP 8) and uses correct type annotations (PEP 484):

from string import digits, ascii_letters


def remove_front_letters(new_s: list[str]) -> None:
    ...
    raise NotImplementedError


def post_processing(new_s: list[str]) -> None:
    remove_front_letters(new_s)
    if any(ch not in digits for ch in new_s[:4]):
        raise ValueError("The first 4 characters must be digits")
    if any(ch not in ascii_letters for ch in new_s[4:6]):
        raise ValueError("The 5th and 6th characters must be letters")
    if len(new_s) >= 7:
        if new_s[6] == 'C':
            del new_s[7:]
        else:
            del new_s[6:]
    new_s.reverse()

If you do want a new list, you can just call this function with a .copy() of your input list.

References: list methods; del statement

PS: If you use Python version 3.8 or lower, instead of list[str] you'll need to use typing.List[str].

Also someone mentioned the possibility of replacing the iteration via all() (or any()) with a "".join(...).isdigit() for example. While this is certainly also correct and technically less code, I am not sure it is necessarily more readable. More importantly it creates a new string in the process, which I don't think is necessary.

By the way, you could even reduce that conditional deletion of list elements to a one liner like this:

...
    if len(new_s) >= 7:
        del new_s[7 if new_s[6] == 'C' else 6:]
    new_s.reverse()

But I would argue that this is worse because it is less readable. Personal preference I guess.

  • Related